# org-mode support - Neuron

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

Something unfortunate happened: I discovered org-mode, and would like to be able to take my neuron notes in org.
Pandoc supports conversion to and from org files, and many more file formats. Would it be possible for neuron to use the pandoc AST internally? (i.e. have type Zettel = ZettelT Pandoc)
Alternatively, how would one implement org support in neuron? Maybe neuron should be format-agnostic in its library, and provide markdown-specific features by default in the actual executable

The main challenge is the link extension. Let's say we add org format as one of the built-in formats to neuron. So .md uses mmark, .org uses Pandoc, .pmd uses Pandoc markdown ... How would we extract and render links and queries?

Can you think of a way we can do this while also supporting the use case of using it with just say mmark or commonmark parser?

I'm not sure to understand the problem, mmark support would be dropped in favor of pandoc markdown, but pandoc supports a bunch of formats already so it should allow everything we would ever want. Extracting links is easy, the pandoc-types package defines the universal document type as well as a few functions to traverse its AST. Rendering links is not a problem either because in all cases, the export format would be HTML (though choosing the export format from the zettel metadata would be even better, but let's keep it simple for now)

Ideally org support wouldn't be builtin, neuron could support multiple formats which could be installed individually e.g. by cabal flags or some equivalent nix feature, and the library part of neuron would only know about the pandoc type, and wouldn't care about how the parsing is handled at all (the pandoc type is very general, and handles metadata)

Wait nevermind, I think neuron (the executable) depending on the full pandoc library isn't a bad idea after all, at least I see no better way to support multiple file formats

I'm currently playing with the commonmark-parser, and at the point where it seems I have to use the pandoc AST (in order to extract link info). I'm going to keep this at the back of the mind (also the original philosophy of keeping the system simple, and not complicated).

@felko by the way, just curious about your situation, which I don't think I understand well. I've used org-mode in the past, so I'm familiar with it. But why would anyone use both org-mode and neuron? They seem orthogonal to me (especially with org-roam).

org-roam and neuron(-mode) are parallel, but org-mode alone is rather orthogonal to neuron, and that's good because it means that somehow I could make some some elisp glue to combine the power of both

How well does pandoc parse org? Does it even support parsing org files?

Let's assume for discussion that neuron internally uses pandoc AST to represent zettel document.

pandoc seems to support org parsing and export yeah, but I can't say how well

Hmm, pandoc AST does work pretty well actually. All the link/query stuff can be done on the post-parsed AST (without having to deal with the original source markup)

On the commonmark-side, we need an inline parser that that will create regular link nodes (of pandoc AST) from neuron links. It will look like this: image.png

The org-mode parser would have to do something similar @felko

Then, once we have the pandoc AST, with ID link and z: links already put in regular link AST nodes, neuron would process them (eg: expand z:tags into tag tree HTML) without having to know anything about markdown or org.

I believe org-mode supports the creation of custom link types, maybe we could use that feature. No sure how they are parsed into the pandoc AST

Pandoc AST is close to Html. It has a link node that maps straight to a element

Okay, I see how all of this would work together. Things are much simpler, actually. Use any input source format, as long as it can define normal HTML links. If the link URL is the same as the link text (the 'short links'), neuron will process them, as it will now operate only on the pandoc AST.

Ok I'm trying to implement this, not sure how I should organize the modules. I can't just make a Neuron.Org module in the library since it doesn't depend on pandoc anymore, which I need for Text.Pandoc.Readers.Org.readOrg

I think we've already covered this issue but it's been a long time

not really related but TeX support would be a big selling point imo, i think many zettelkasten enthusiasts are also academics

felko said:

Wait nevermind, I think neuron (the executable) depending on the full pandoc library isn't a bad idea after all, at least I see no better way to support multiple file formats

I said this ^ but I don't remember if you agreed

why is the markdown parser in the src/lib already?

@felko yes, src/app can depend on Pandoc on whatever (but not src/lib as that have to compile on ghcjs). You can put the orgmode parser in src/app then

ah right, for ghcjs

ok then, thank you

so src/app/Neuron/Org.hs?

For now yes. Once it is working we can move around

yeah I was thinking src/lib/Neuron/Format/Markdown.hs and src/app/Neuron/Format/Markdown.hs would be cleaner, but I'll try to focus on the actual feature this time lol

Eventually we can even make it general so that it works with any Pandoc input reader (like reST)

Yea don't do premature refactoring. Duplicate code is fine in the beginning

Not sure how to proceed, buildZettelkasten is in the lib and uses the markdown parser

Pass the 'reader' (eg: parseMarkdown) from app/Main.hs.

as arguments to generateSite

Should probably be Map Text ReaderFunc where the key is the fileextension (md, org, ...)

Eventually neuron.dhall would specify this map, and they would be passed in the Config

that only builds the website, we still need to take org notes into account when building the graph

not sure what you mean. you can't build the website without building the graph.

I can't figure out where but I guess generateSite calls buildZettelkasten at some point

yeah that's in generatSite

note that buildZettelkasten is a pure function (cerveau uses it), and it should remain so

that's ok, pandoc has a pure parser monad

can I add the Map Text ReaderFunc argument to buildZettelkasten?

It currently takes an already-traversed list of files. [(FilePath, Text)]. So we shouldn't keep passing the extensions map once it is used. Maybe instead: [(ReaderFunc, [(FilePath, Text)]]

alright, let's do that, but I think buildZettelkasten should be split into a parsing function and a function that just builds the graph using the list of parsed zettels

Actually, n/m - have it take Map Text ReaderFunc and match extensions in there. because, then Cerveau can pass the neuron.dhall's configuration straight to it.

felko said:

alright, let's do that, but I think buildZettelkasten should be split into a parsing function and a function that just builds the graph using the list of parsed zettels

that sounds like premature refactoring. we can do it later.

aim for a minimal diff. then let's see what can be changed

parsing actually happens in parseZettels

I suppose you'd pass the map to parseZettels, and that's where we would do the extension matching.

also, ZettelParserError should be reused, as a common error type between input formats.

can I put ZettelParserError in a separate module instead of importing Neuron.Markdown (ZettelParserError(..)) in Neuron.Org? or is that also premature refactoring

that's okay - since we know they will be moved out of it anyway

with most forms of premature refactoring however, i would often find out later (based on new information) that how I had refactored wasn't correct. so i have to change things again. which is why it is best to get something working first, and once you have a better idea of the active code, that would be the best time to refactor.

this is a concise/better explanation of what i was trying to say:

Attempting premature refactoring risks selecting a wrong abstraction, which can result in worse code as new requirements emerge[2] and will eventually need to be refactored again. https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)

yeah I get the idea but it's not always obvious to see if a specific refactor is premature or not, also its hard to break an old habit

ah, didn't know that one

some people even have the opposite habit. :-)

i used to do premature optimization all the time, and then at Obsidian (my first Haskell job) i got criticized to death in code reviews. took me a while to realize the benefits of it.

do you prefer me to make a draft PR so that you can review progressively when you have time?

zettelIDSourceFileName :: ZettelID -> FilePath
zettelIDSourceFileName zid = toString $zettelIDText zid <> ".md"  ouch So we store the filename in ZettelT? Or, just format in ZettelT (then construct filename out of it) Related: what happens if there is both foo.md and foo.org? To prevent such conflicts, one idea is to put the format in ZettelID - but make its Eq instance ignore the format. Then zettelIDSourceFileName can be fixed trivially Meta question: why not support a single format only (eg: either md or org, but not both)? If not, what's the use case for simultaneous formats? Meta question: why not support a single format only (eg: either md or org, but not both)? If not, what's the use case for simultaneous formats? Sometimes I want org, sometimes I want markdown, and I would like to be able to write TeX to write my courses (mathjax is nice but not enough for writing TeX productively) Maybe its just me but I know that I'll use different formats if neuron supports it ok - so we explicitly prevent cases like foo.md vs foo.org and have neuron produce an error for that zettel. sorry I misunderstood your point, I didn't understand that you were talking about a single format for the same note No, I was talking about a single format for entire zettelkasten. But nevermind, your use case is a valid one. Sridhar Ratnakumar said: So we store the filename in ZettelT? Or, just format in ZettelT (then construct filename out of it) I would say the filepath, since at several places zettelJsonFull is used, which adds the path having ZettelT store the path makes this function redundant We have two proposed approaches: 1. Store filepath in ZettelT 2. Store format in ZettelID Without having thought this in detail, it seems to me that (2) is better. What do you think? Note that zettelJsonFull uses absolute path, not relative. We still need it. I prefer filepath in ZettelT because it feels more natural to me, storing the format in ZettelID suggests that foo.md and foo.org can coexist even if the Eq instance only compares the id itself we do the same trick for ZettelT neither can ZettelT provide that guarantee. let me think The problem with (1) is that you can't determine the original filename associated with a zettel ID, without also knowing its file contents. consider a http request action: SaveZettel :: ZettelID -> Text -> m () the server can determine the path to update solely from ZettelID. with (1) this won't be possible anymore. it can look in the zettelgraph another thing to take into consideration is that, presumably, adding a field to ZettelT takes less refactoring than wrapping the current zettel id into a record felko said: another thing to take into consideration is that, presumably, adding a field to ZettelT takes less refactoring than wrapping the current zettel id into a record that shouldn't be a factor in choosing an approach. if we have to refactor a lot to 'get things right', then so be it. let me think how this would look like from cerveau's side. how would metadata parsing look in the org reader? it uses pandoc's meta type, which stores things like #+TITLE my org zettel #+DATE "2020-06-30"  I have two options: • mimic the parseMarkdown approach and return forall meta. FromJSON meta => FilePath -> Text -> Either ZettelParseError (Maybe meta, Pandoc) (Pandoc provides a ToJSON instance for its meta type) • extract manually and return Either ZettelParseError (Maybe Meta, Pandoc) i don't know yet for tags, not sure what meta lists look like from org's side okay, back to the problem, first of all it seems to me that sooner or later we will need an ADT like this: data ZettelFormat = ZettelFormat_Markdown | ZettelFormat_Org  assuming we have that, then Map Text ZettelFormat in neuron.dhall specifies format to use for a given file extension. then, we can store this format value in ZettelT (zettelFormat :: ZettelFormat). we shouldn't need to store the filepath. from the cerveau's side, it would look like: SaveZettel :: ZettelID -> ZettelFormat -> Text -> m ()  finally, the filename of a zettel can be determined from ZettelID and ZettelFormat. it's worth keeping in mind that -- from an abstraction point of view -- we "forget" the notion of files/filenames/filesystem once the graph is built. this is why i think filenames should not leak into higher layers sounds good to me I warn you, the first commit will be quite big, I hope you don't mind zettelFormat would come in handy when rendering the zettel in cerveau. it can just dump the raw text if it is non-markdown. Sridhar Ratnakumar said: zettelFormat would come in handy when rendering the zettel in cerveau. it can just dump the raw text if it is non-markdown. off topic but maybe you can still display the zettel that are not in markdown if the .neuron directory is not gitignored, then you can just return the html that's already generated not sure if that works cerveau deals directly with github api. no local filesystem involved (it caches file content in database) But - I can indeed generate the HTML on the fly in the backend, send that via API to the frontend. That's something I need to experiment with ... (you just won't get live editor preview though) that's be cool i think, cerveau can also serve as a zettelkasten reader felko said: I have two options: • mimic the parseMarkdown approach and return forall meta. FromJSON meta => FilePath -> Text -> Either ZettelParseError (Maybe meta, Pandoc) (Pandoc provides a ToJSON instance for its meta type) • extract manually and return Either ZettelParseError (Maybe Meta, Pandoc) what do you think about this? I'm afraid that the generated JSON will look a bit weird since it reflects the document structure of pandoc @felko generated JSON ? what do you mean? and i don't understand your two options question, since they both return Either ZettelParseError (Maybe Meta, Pandoc) oh, wait one is not polymorphic. that's okay - it is always Meta anyway alright, I just checked the implementation of ToJSON Pandoc.Meta (= what I meant by generated JSON) and its really pandoc-specific anyway, I'll do the non polymorphic option I imagine something like: type NeuronReader = FilePath -> Text -> Either ZettelParseError (Maybe Meta, Pandoc) readZettel :: ZettelFormat -> NeuronReader readZettel = \case ZettelFormat_Markdown -> Neuron.Markdown.parseMarkdown ...  felko said: it uses pandoc's meta type, which stores things like #+TITLE my org zettel #+DATE "2020-06-30"  I have two options: • mimic the parseMarkdown approach and return forall meta. FromJSON meta => FilePath -> Text -> Either ZettelParseError (Maybe meta, Pandoc) (Pandoc provides a ToJSON instance for its meta type) • extract manually and return Either ZettelParseError (Maybe Meta, Pandoc) You don't need TITLE if there is a level 1 heading in the org note. title in meta is deprecated eventually tags too will be deprecated right now, meta is used only for tags and date. what if type NeuronReader = Text -> Either ZettelParseError (Maybe Meta, Pandoc) instead? parseZettels :: Map.Map Text (FilePath -> ZettelReader) -> [(FilePath, Text)] -> [ZettelC] parseZettels readers fs = flip mapMaybe fs$ \(path, s) -> do
-- TODO either use fromJust since this is supposed to be unreachable
--      or report unsupported extension
zreader <- Map.lookup (toText $takeExtension path) readers zid <- getZettelID path pure$ parseZettel (zreader path) zid s


which allows me to do this in parseZettel:

parseZettel ::
ZettelID ->
Text ->
ZettelC
parseZettel zreader zid s = do
...


the idea is that the path is passed in the closure instead of doing case zreader (zettelIDSourceFileName zid) s of

you mean - drop filename from the parser errors?

no, the map contains functions that take the filename as a parameter, the question is just whether we want ZettelReader to include the filename instead

my solution allows me to reuse the ZettelReader alias in parseZettel, and take advantage of the fact that I don't need to compute the path anymore

org doesn't seem to allow tags containing /, that's annoying
I can define a #+NEURON_TAGS field that defines the neuron tags (as opposed to org tags) but I'm not really satisfied

also I can't use #+TAGS since that's already used by org for something else

It uses :Peter:Boss:Secret: instead of Peter/Boss/Secret? not sure how the pandoc parser represents it

n/m - that's not hierarchical

why does it not allow /? what _does_ it allow?

Sridhar Ratnakumar said:

why does it not allow /? what _does_ it allow?

I think this is the relevant code (from pandoc)

orgTagWordChar :: Monad m => OrgParser m Char
orgTagWordChar = alphaNum <|> oneOf "@%#_"


no idea about org-mode itself, but I know org mode doesn't recognize tags with / either

Sridhar Ratnakumar said:

what about tags as a property ? https://orgmode.org/manual/Property-Syntax.html#Property-Syntax

Good idea

org-mode could add a lot more feature than what is currently supported in markdown, it's up to you but I think it would be a waste to be conservative about having exactly the same behavior for all formats
e.g. writing multiple zettels in the same file, inheriting tags, etc...

let's get the simple case done, and then open up a proposal on github of what exactly that would look like.

yeah I was just thinking out loud

you mentioned something about waiting org support for releasing 0.6, but that may take a while to implement and get it right, also I won't be working full time on this

@felko i won't be thinking about release until mid-July'ish - but of course, no rush on an open source project.

I realize that Reflex.DOM.Pandoc.URILink.queryURLLinks alone cannot work with org, since custom org links need to be prefixed with the link type, e.g. [[neuron:76ab876e]]

we need to have a format-specific function for extracting links

    -- Extract all (valid) queries from the Pandoc document
extractQueries :: MonadWriter [QueryParseError] m => Pandoc -> m [Some ZettelQuery]
extractQueries doc =
fmap catMaybes $forM (queryURILinks doc)$ \ul ->
...


possible fix: rewrite

type ZettelReader = FilePath -> Text -> Either ZettelParseError (Maybe Meta, Pandoc)


as

data ZettelReader = ZettelReader
{ readZettel :: FilePath -> Text -> Either ZettelParseError (Maybe Meta, Pandoc),
extractQueries :: forall m. MonadWriter [QueryParseError] m => Pandoc -> m [Some ZettelQuery]
}


here's the PR in the meantime: https://github.com/srid/neuron/pull/263
hope it's not too big

Implements support for writing zettels in org-mode. Glue pandoc's org parser to neuron Extract metadata from the first headline's properties (date and tags) Take .org files into account...

felko said:

possible fix: rewrite

type ZettelReader = FilePath -> Text -> Either ZettelParseError (Maybe Meta, Pandoc)


as

data ZettelReader = ZettelReader
{ readZettel :: FilePath -> Text -> Either ZettelParseError (Maybe Meta, Pandoc),
extractQueries :: forall m. MonadWriter [QueryParseError] m => Pandoc -> m [Some ZettelQuery]
}


That sounds like configuration over convention (ref). You shouldn't need to configure its behaviour if your orgmode parser returns the expected Pandoc structure. Basically for every [[neuron:76ab876e]], produce a Link node with both link text and url set to 76ab876e

Do we really need the neuron: prefix?

If we must have the prefix, let's make it z: instead of neuron: to be consistent with the rest of the queries

Sridhar Ratnakumar said:

If we must have the prefix, let's make it z: instead of neuron: to be consistent with the rest of the queries

that's what I chose for now, see the last comment on the PR, so the links look like [[z:zettel/id]] currently, I think we can shrink it down to [[z:id]]

z:<id> would prevent the user from having zettels.md and tags.md right?

@felko Just took a cursory look at the PR; structure looks good.

Sridhar Ratnakumar said:

z:<id> would prevent the user from having zettels.md and tags.md right?

Hm good point, I didn't think about that,

You shouldn't need to configure its behaviour if your orgmode parser returns the expected Pandoc structure. Basically for every [[neuron:76ab876e]], produce a Link node with both link text and url set to 76ab876e

Does not a link like [[67c6f7a0.org]] already work in org-mode?

I don't think so, I have very little experience with org-mode but after testing a few combinations, it seems that bracket links proceed in that order:

• if the adress is a valid URL, point to that
• otherwise, test if is of the form type:something in which case it looks for a link type called type and passes something to the follow function corresponding to that link type
• otherwise, the link is understood as an internal link, to another section in the same file

I just found out that angle links are supported by pandoc https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Readers/Org/Inlines.hs#L447

Universal markup converter. Contribute to jgm/pandoc development by creating an account on GitHub.

So <foo-bar> and <z:zettels?tag=foo> work in orgmode?

<z:zettels?tag=foo> will work because it's an URL (it can also be treated as a custom link type on the emacs lisp side)
sadly, I don't think <foo-bar> works

weird thing is that I can't get pandoc to parse <z:zettel/id> as a link

commonmark appears to have a much more configurable interface, regular pandoc readers are opaque

<z://zettel/id> works though

One reason to favour [[..]] over <..> in org notes is that the former seems to be what people use for links. So it would be natural to use the same syntax on org.

do you insist on having zettel links display the exact title of the zettel, or would it be possible to allow links with description e.g. [[z:zettel/id][desc]]?

e.g. having [[z:id]] and [[neuron:zettels?tag=foo]]

that's not really satisfying but at least there's no ambiguity

Does [[z:/id]] work?

felko said:

do you insist on having zettel links display the exact title of the zettel, or would it be possible to allow links with description e.g. [[z:zettel/id][desc]]?

not a big fan of it

yeah [[z:/id]] works

kind of hacky but maybe we can support both [[z:id]] and [[z:/id]], then we can still refer to zettels called zettels or tags from org mode

felko said:

e.g. having [[z:id]] and [[neuron:zettels?tag=foo]]

I don't like this one, because then as an user you have an extra distinction to remember.

It is needless distinction, resulting solely because of implementation (pandoc parser)'s limitation

yeah understandable, even though i think org mode users hardly ever enter links manually

felko said:

kind of hacky but maybe we can support both [[z:id]] and [[z:/id]], then we can still refer to zettels called zettels or tags from org mode

shrug. i'd just go with z:/id. don't like to deal with an extra edge case if it saves only one character, but with an added complexity in parser

ok let's go with z:/id

I'm testing this PR in cerveau. Involves more changes than I originally estimated.

One confusion is the notion of zettel "key" has changed.

It used to be just ZettelID, but now we have custom paths.

Very likely that I'm gonna replace the neuron.dhall config type with [ZettelFormat], and always use *.md for Markdown, etc.

As well as renaming zettelPath -> zettelFileName to make it clear there cannot be directories in the path

Uhh, this is still confusing

zettel IDs should still be unique, why can't the ID alone serve as the key?

Sridhar Ratnakumar said:

Very likely that I'm gonna replace the neuron.dhall config type with [ZettelFormat], and always use *.md for Markdown, etc.

that's ok

ID alone cannot serve as the key to retrieve content; you need to know whether to find foo.md or foo.org.

Unless you load the neuron graph to look it up

isn't the graph loaded when using cerveau?

not all of the web app is stateful

wasn't cerveau meant to support markdown only anyway?

doesn't have to be (only real-time preview strictly needs markdown)

umm, there might be other issues with supporting non-markdown formats in cerveau. i think i'll just focus on markdown.

which makes me think: I should just make neuron.dhall specify a single format, not a list of formats

so you won't have foo.org linking to bar.md

though from neuron perspective i can see the value of supporting multiple formats, if only to faciliate gradual migration.

Okay, let's say we support multiple formats: ["markdown", "org"]. And cerveau can propagate "unsupported formats" error up to the ZIndex.

neuron can do the same too, if the user specifies ["markdown", "org", "blah"]

yeah I was about to say that

Which means if you specify ["org"], cerveau will show an empty zindex, with that error

@felko Do you have anything left to do in the PR?

And I take that  [[z:/foo?cf]] is the best possible linking syntax for orgmode ...

i didn't have the time to work on it today but I can write tests maybe
I was also waiting for you to see how it will play with cerveau
other than that and documentation I think it's ready, what do you think?

cerveau part is done; what sorts of tests were you thinking of?

I don't know how exactly but it would be nice to test duplicate ID detection and parser edge cases (for both markdown and org, using real files in a sample zettelkasten)

okay, i'll merge it after writing docs. if you get to writing those tests, we can merge it in a different PR

alright :thumbs_up:

Select sensible defaults for the pandoc extensions

not sure if we should pick "hard line breaks" or whatever its called

i saw that pandoc has something to parse user-defined extension lists, it would be nice to be able to set pandoc extensions from neuron.dhall (or even, per zettel extensions, enabled in the metadata block)

Is there a way to customize how org mode is rendered? I see that by default, heading are numbered, but I'd like to change that. Ideally, I'd set a different default but that could be overridden if needed.

If Pandoc allows such customization, neuron can (theortically).

Where would I specify them?

In the def argument to readOrg here: https://github.com/srid/neuron/blob/master/neuron/src/app/Neuron/Reader/Org.hs#L30

Future-proof and simple-to-use notes system based on Zettelkasten. - srid/neuron

Hmmm. For clarity, is there a user-level way to customize this, or would I need to customize neuron source to effect this? (I'm quite new to this.)

@porcuquine No user-level way to customize; it will require changing neuron, by someone willing to do the investigation and hack on it.

Got it, thanks. I'll look into it as time time allows.