Hi, I want to extract information out of a document parsed by pandoc and I am questioning my methods. At the moment I am using partial pattern matched in the Maybe monad to extract info, but this is quite tedious. For example I know the first thing will be a headline 1 with the content of My last n months and I want to extract the n as Int.
At the moment, I am doing:
parseDoc::[Block]->MaybeMyDataparseDocdoc=do(Header1x1:xs1)<-puredocn<-readMaybe.takeWhileisDigit.unpack=<<stripPrefix"My last "$stringifyx1
SYB was not really a solution to my problem, because the data is not nested very deep. I finally found something I am reasonable happy with, this disgusting beauty:
newtypeParseMsa=MkParseM(States(Maybea))instanceFunctor(ParseMs)wherefmapf(MkParseMx)=MkParseM(fmap(fmapf)x)instanceApplicative(ParseMs)wherepurex=MkParseM$pure$Justx(MkParseMx)<*>(MkParseMy)=MkParseM$x>>=\f->y>>=\v->pure$f<*>vinstanceMonad(ParseMs)where(MkParseMx)>>=f=MkParseM$x>>=\caseJusty->case(fy)ofMkParseMr->rNothing->pureNothinginstanceMonadFail(ParseMs)wherefail_=MkParseM(pureNothing)embed::Maybea->ParseMsaembed=MkParseM.purehead'::ParseM[Block]Blockhead'=MkParseM$dodoc<-getcasedocofx:xs->putxs*>pure(Justx)_->pureNothingrunParser::ParseMsa->s->MaybearunParser(MkParseMx)=evalStatexparseRetro::[Block]->MaybeIntparseRetro=runParser$doHeader1_(stringify->h1)<-head'n<-embed$readMaybe.takeWhileisDigit.unpack=<<stripPrefix"My past "h1letpast=MkPastMonthsnpure$n
parseRetro::[Block]->MaybeIntparseRetro(Header1_h1':_)|(stringify->stripPrefix"My past "->Justh1)<-h1'=fmapMkPastMonths$readMaybe$takeWhileisDigit$unpackh1parseRetro_=Nothing
parseRetro::[Block]->Maybe[Project]parseRetro=runParser$doHeader1_(stringify->h1)<-head'(n::Int)<-embed$readMaybe.takeWhileisDigit.unpack=<<stripPrefix"My past "h1projects<-takeWhileM(\caseHeader2_(stringify->("Project"`isPrefixOf`)->True)->JustparseProject_->Nothing)-- let past = MkPastMonths n projectspureprojects
whittle :: Filterable f => Coprism s t a b -> f b -> f t
whittle p b = runJoker $ p $ Joker $ b
There's a trivial Filterable instance for any Alternative + Monad (although whether it's lawful depends on the specific monad and alternative instances)
Hi, I want to extract information out of a document parsed by pandoc and I am questioning my methods. At the moment I am using partial pattern matched in the Maybe monad to extract info, but this is quite tedious. For example I know the first thing will be a headline 1 with the content of
My last n months
and I want to extract then
asInt
.At the moment, I am doing:
Is there a better way?
SYB
you can use
everywhere (++) $ mkQ [] $ \case Str something -> try to extract your stuff here
or you know, whatever pattern you want to find
thanks, I will try that
image-9206c6a0-f708-4365-8841-dd3e198a6a52.jpg
oops sorry
you want
everything
not
everywhere
everywhere is a cata
Is there a good tutorial about this somwhere? The site that is linked everywhere in the docs is down apparently
ths is the paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2003/01/hmap.pdf
it's been a while since i read it
at a high level; a
Data a
constraint says that you have some sort of run-time representation ofa
run-time generic representation*
(as opposed to GHC.Generics which is at compile time)
everything
lets you run a monoidal query over someData a => a
you do that with
mkQ
which takes a default value for if it doesn't match anything, and then give it a lambda specialized at whatever type you wanteverything
will search through the type for the type your lambda takes, and then run your function, and accumulates the resultsfor example, i wrote this just today, which finds all of the
UnboundVar
nodes inside of anExp
:you can also use
extQ
to glue multiplemkQ
s together, eg if you want to simultaneously target different types in your structuremaybe that helps?
I will try, thanks Sandy!
SYB was not really a solution to my problem, because the data is not nested very deep. I finally found something I am reasonable happy with, this disgusting beauty:
@Jan van Brügge
@TheMatten this is just the first thing I need to parse, and threading through the remainder of the list would get old very quickly
with this State wrapper, I can just continue with partial pattern matches until I've exhausted the list
For example I think this code reads rather nice:
With
takeWhileM
being:I only skimmed this topic, but cannot this be solved using
Text.Pandoc.Walk
(seequery
andwalkM
)?example: https://github.com/srid/rib/blob/e41eae3/src/Rib/Parser/Pandoc.hs#L102-L109
The problem is not finding the stuff, but rather pulling the information out of it. AFAICT query can't help me there
That example actually pulls the image URL using
query
@Jan van Brügge I can't exactly reconcile your snippet:
with the partial maybe matching you're referring to
@Jan van Brügge is something like this what you're looking for:
There's a trivial
Filterable
instance for anyAlternative
+Monad
(although whether it's lawful depends on the specific monad and alternative instances)