Do we need to separate domain types and db access type?

2020-09-10 08:11:22

Joel McCracken said:

yep. I agree with Kris fwiw, its better to have your business logic in pure types, and have a monad transformer stack "at the edges". I'd like to keep PersistT out of things as much as possible just because of the principle of least power!

@Joel McCracken to avoid discussing different topic in the same stream, I dig a little about this and found MaxGabriel's comment on this issue https://github.com/yesodweb/persistent/issues/1115. looking at this, I'm not sure if what Kris's said is true in all cases.

Documentation re intended use and best practices · Issue #1115 · yesodweb/persistent

It has been reported that persistent encourages you to use your database types in your domain. We should investigate the documentation for persistent and ensure that is not the case. We should inst...

Georgi Lyubenov // googleson78

2020-09-10 08:31:51

it may be possible to share the types sometimes, but I think it's a much safer to always separate them by default

Georgi Lyubenov // googleson78

2020-09-10 08:32:27

and only merge them if you have to optimise (although I don't imagine this deconstructing/constructing of types with one constructor to really cost much in most cases)

Fintan Halpenny

2020-09-10 09:14:17

I think another issue for said database was that the representation of the domain was tied to how you could formulate it in terms of the types the database could handle.
In reality, we could have had a better representation in the Haskell domain and have boundary conversions to/from the database representation.

Joel McCracken

2020-09-10 13:38:23

Well, what Kris said is certainly an opinion; i don't think its the only valid one. I do agree though, its much easier to reason with pure code instead of code that is interspersed with a monad transformer stack, so that take that for what its worth

Joel McCracken

2020-09-10 13:39:38

There are two separate issues here though and im not sure which one is being discussed exactly; are we talking about having separate domain logic and persistence types, or are we discussing if it makes sense to have PersistT interspersed throughout your business code?

Georgi Lyubenov // googleson78

2020-09-10 13:40:33

we were discussing the domain vs persistence types, as far as I can tell

Georgi Lyubenov // googleson78

2020-09-10 13:41:13

ofcourse it's also better to not have SqlPersistT around in your business logic, the same way it's better not to have IO directly

Joel McCracken

2020-09-10 13:42:15

yeah so i mean there are several things here; generally the preferred domain type you might have is not going to be be the same as the preferred type for persistence

Joel McCracken

2020-09-10 13:42:25

https://github.com/yesodweb/persistent/issues/1115#issuecomment-690094112 @Georgi Lyubenov // googleson78 has a great example there

Documentation re intended use and best practices · Issue #1115 · yesodweb/persistent

Joel McCracken

2020-09-10 13:43:10

now, you can actually use them sometimes, but you just need to be aware that you have implicit coupling, and you may need to split them out at some point

Joel McCracken

2020-09-10 13:44:37

would an example help illuminate the issue @Rizary ?

codygman

2020-09-10 14:01:49

I bet an example would help. Seperating domain types and persistent entities are a hard sell if many things end up like:

[persistLowerCase |
    PersonDBO
      age Int
]

data Person = Person { _id :: Int64, age :: Int}

personDBOToPerson :: PersonDBO -> Person

If the two types end up being the same, it seems kind of pointless.

James King

2020-09-10 14:11:43

It can seem like extra work. But it does give you some flexibility to change representation at the database layer without having to change your business logic.

It doesn't happen a lot in straight-forward CRUD web applications in my experience... but it can be nice for lazy loading related data, changing underlying representations for performance reasons, sharding, etc.

James King

2020-09-10 14:12:49

I usually do this stuff in the database itself with physical tables being separated by views and materialized views. :shrug:

James King

2020-09-10 14:17:07

You might also have some meta-information on the DB type that you could discharge to Either in the personDBOToPerson function... :thinking:

James King

2020-09-10 14:18:47

That's pretty random. Never mind me. :sweat_smile:

Fintan Halpenny

2020-09-10 14:35:56

Well, the Person example above is already broken. A Person shouldn't be able to have negative age :stuck_out_tongue_wink:

codygman

2020-09-10 14:36:39

I think sometimes it can be good to separate, but for CRUD apps I can't make a good argument for separating all the time.

codygman

2020-09-10 14:37:30

Right, so making smart constructors for person from personDbo justifies separation

Fintan Halpenny

2020-09-10 14:39:22

You might also want migration paths. If your DB representation changes doesn't necessarily mean your logic needs to

codygman

2020-09-10 14:39:46

Thing is, the lazier thing is to say "eh that's not a big deal". I think using entities directly discourages type safety, esp in the "parse don't validate" sense

Rizary

2020-09-10 14:42:00

Joel McCracken said:

would an example help illuminate the issue Rizary ?

it will if you don't mind!! thanks, always appreciate your response.

Joel McCracken

2020-09-10 14:42:50

lemmie think on it; i think the issues would be best illustrated in a reasonably sizable example

Joel McCracken

2020-09-10 14:43:37

one of the abasolutely most useful techniques in haskell is having your domain types be correct-by-construction, like is referenced above

codygman

2020-09-10 14:50:48

If your team isn't already sold on correct by construction or the default is "just use the entity", you'll need some pretty good motivating examples though.

codygman

2020-09-10 14:52:21

I think I'll add the person example and how it can go wrong to the GitHub thread when I'm at a computer. Then after it's motivated show how in the domain type we parse instead of validating

codygman

2020-09-10 17:09:30

I posted a pretty good sized comment with many examples talking about this in the issue:

https://github.com/yesodweb/persistent/issues/1115#issuecomment-690504252

Documentation re intended use and best practices · Issue #1115 · yesodweb/persistent

Joel McCracken

2020-09-10 17:14:48

Can I just say, i'm really glad this conversation is happening

Rizary

2020-09-11 05:29:14

Thank you @codygman and @Joel McCracken for the discussion. Although sometime I really wish I have a bigger project so I can learn more about this.

I really like about the lexi's blog on "parse don't validate" and I think that the way I directly access my entities is against what advised in that blog

Joel McCracken

2020-09-17 17:09:19

So I have an example -- its not for a DB, but it is for another data layer.

Right now at work I am writing a system for pulling data in to then be reuploaded to our internal services. This thing should run on a regular basis, etc. the system acts on what is contained in a configuration file. Here is an example file:

api:
  version: "0.1.0"
init:
  name: "getFileForProcessing"
  module:
    tag: "file.getFileForProcessing"
    contents:
      fileLocation: "test/fixtures/appTest.csv"
      validate: True
  logLevel: error
  next: "mapFileForOutput"
actions:
- name: "makeOutput"
  module:
    tag:  "file.output"
    contents:
      destination: "test/fixtures/result2.out"
  next: log
- name: "log"
  module:
    tag:  "log"
    contents:
      level: error
      text: this is log output in worker
- name: "mapFileForOutput"
  module:
    tag:  "mapping.basic"
    contents:
      strict: False
      mappings:
        inputColumnName: output-column-name
    next: "makeOutput"
  logLevel: debug
  next: "makeOutput"

In this example, this process would read a file from the filesystem (presumably put there by another process)
and it would output it to the filesystem (to also be sent along to something that expects it there).
This is just to give you a simple idea how how the system works; its not a realistic example, in reality we are
doing pulling data in via HTTP, modifying it, and then submitting it to graphql.

Anyway, so imagine this scenario:

api:
  version: "0.1.0"
init:
  name: "getFileForProcessing"
  module:
    tag: "file.getFileForProcessing"
    contents:
      fileLocation: "test/fixtures/appTest.csv"
      validate: True
  logLevel: error
  next: "mapFileForOutput"
actions: []

notice that next does not match any action name. However, it IS valid yaml. And the way our data types that
the yaml parses into works is that next is a Maybe Text, because it may or may not be there (the final
step in an process will not have a next value set).

The problem with this data representation is that it correctly parses this example. The only way this
issue is discovered is while the process is executing the steps, and it looks up the next step in actions, and
it finds that no action with that name exists. If we imagine that this is a very long running process, a user
might not get an alert that the error has occurred until hours after initiating the process with this config.

Now, we could write a post-parsing validation step that indicates if there is an error or not. But this still
leaves us with an annoying data model. Any time you want to figure out what the next step is in a process, you
have to do a lookup which might fail, which is just following the types. If we have a list of actions, then
looking for one with a matching name might not return any, or it might return multiple with the same name.

You can choose to ignore that possibility, but that is considered bad haskell style. Instead, whenever possible
we try to make illegal states unrepresentable. So lets say that the data model is like this today:

data ConfigAction
  = ConfigAction
  { nextActionName :: Maybe Text
  ...
  }

We would rather have a type that looks like this:

data Action
  = Config
  { actionConfigAction :: ConfigAction
  , actionNextAction :: Maybe ConfigAction
  ...
  }

That way our code can use nextAction to access the next action; of course this might be the last one (hence the Maybe),
but this removes the possibiliby of there incorrectly being a named action that doesn't exist, or
multiple actions named the same thing.

So basically the idea is to "make illegal states unrepresentable", but also its just more convenient to have datatypes that
map to your business rules. This is essentially the same as this https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

Rizary

2020-09-24 06:48:47

Interesting, I still need time to digest it at first, but that's make sense. One question to clarify my understanding, this nextAction help us to access the next action in "next" ConfigAction, right?

Joel McCracken

2020-09-24 13:22:37

yessir!

Do we need to separate domain types and db access type? - Haskell