The problem that I'm running into, is that I'm confused about how to use L.space. I'll need to give L.space a parser which does not skip newlines, so that I can emit NewLine tokens. The docs recommend a setup where you define a lexeme that automatically skips trailing whitespace. I currently have
So, I think I'll need to modify sc to not eat newlines by passing something other than space1 as the first argument to L.space.
The thing I can't figure out is how to deal with the fact that often, you will expect to find a newline immediately after a preceding lexeme, with no intervening whitespace. Since the argument to L.space can't accept empty input, sc can't accept empty input. How do I deal with this? Do I have to throw out the whole model of always consuming whitespace after a lexeme? Is making newline a lexeme the wrong approach? If so how do I handle parsing the separate bindings later without newline information?
As you have noted, your problem here is that space1 (and hence sc) also consumes newlines, as they are considered just another form of whitespace. Luckily, that’s easy enough to fix: you just need to change space1 to hspace1 (another predefined parser which doesn’t accept newlines). To parse a newline, you can just add Newline <$ eol at the appropriate place in your lexer.
(And also, by the way, the usual approach with megaparsec is to combine lexing and parsing into one step. megaparsec still has support for two stages of lexing+parsing, but it’s usually easier to use one step instead.)
You’re welcome for the advice! (And sorry for recommending a function from 9.0.0; I hadn’t realised hspace1 was new).
For parsing and lexing simultaneously, the basic idea is to define your parsers directly in terms of your lexers, rather than indirectly in terms of your lexeme data type. For instance, instead of defining parseEquality = Equality <$> satisfy isName <* single Equals <*> satisfy isLiteral, say, you might instead define parseEquality = Equality <$> parseName <* symbol "=" <*> parseLiteral, with parseName :: Parser String rather than parseName :: Parser Lexeme. (Admittedly I’m not completely sure how that would work with significant newlines — I’d say that manually adding (eol <|> eof) parsers at the appropriate places might work, so something like parseEquality = Equality <$> satisfy isName <* single Equals <*> satisfy isLiteral <* (eol <|> eof), but maybe just try it and see what works.
I'm messing with megaparsec, trying to create a simple language. Right now I'm trying to lex
into
for later parsing.
The problem that I'm running into, is that I'm confused about how to use L.space. I'll need to give L.space a parser which does not skip newlines, so that I can emit
NewLine
tokens. The docs recommend a setup where you define a lexeme that automatically skips trailing whitespace. I currently haveSo, I think I'll need to modify sc to not eat newlines by passing something other than space1 as the first argument to L.space.
The thing I can't figure out is how to deal with the fact that often, you will expect to find a newline immediately after a preceding lexeme, with no intervening whitespace. Since the argument to L.space can't accept empty input, sc can't accept empty input. How do I deal with this? Do I have to throw out the whole model of always consuming whitespace after a lexeme? Is making newline a lexeme the wrong approach? If so how do I handle parsing the separate bindings later without newline information?
As you have noted, your problem here is that
space1
(and hencesc
) also consumes newlines, as they are considered just another form of whitespace. Luckily, that’s easy enough to fix: you just need to changespace1
tohspace1
(another predefined parser which doesn’t accept newlines). To parse a newline, you can just addNewline <$ eol
at the appropriate place in your lexer.(And also, by the way, the usual approach with megaparsec is to combine lexing and parsing into one step. megaparsec still has support for two stages of lexing+parsing, but it’s usually easier to use one step instead.)
That's actually kind of what I had. Couple of things:
I was trying to use
hspace1
, but I'm getting variable not found, even though I'm on megaparsec 8.0.0, and importing Text.Megaparsec.Char unqualified:Not sure what's going on there...?
As a workaround, I tried defining
But I was getting "unexpected newline".
Let me change it back, and replicate the error...
Oh wait, it works this time. Not sure what I did differently. Thanks!
Still very confused that I can't access the predefined hspace1 from megaparsec
James Sully said:
hspace1
is defined in 9.0.0, I can't find it in 8.0.0 in hackage - https://hackage.haskell.org/package/megaparsec-8.0.0Ah, I didn't realize I was looking at the 9.0.0 docs. I think there's a typo, the 9.0.0 docs say it existed since 8.0.0:
Thanks!
@bradrn thanks for the advice re parsing and lexing simultaneously, I'll play around with that
You’re welcome for the advice! (And sorry for recommending a function from 9.0.0; I hadn’t realised
hspace1
was new).For parsing and lexing simultaneously, the basic idea is to define your parsers directly in terms of your lexers, rather than indirectly in terms of your lexeme data type. For instance, instead of defining
parseEquality = Equality <$> satisfy isName <* single Equals <*> satisfy isLiteral
, say, you might instead defineparseEquality = Equality <$> parseName <* symbol "=" <*> parseLiteral
, withparseName :: Parser String
rather thanparseName :: Parser Lexeme
. (Admittedly I’m not completely sure how that would work with significant newlines — I’d say that manually adding(eol <|> eof)
parsers at the appropriate places might work, so something likeparseEquality = Equality <$> satisfy isName <* single Equals <*> satisfy isLiteral <* (eol <|> eof)
, but maybe just try it and see what works.Here's what I have at the moment, seems to be working ok:
I can’t see the code…
very much a wip, but it's passing my very simple tests
Next thing is to suss out lambdas I think
But for now I need to go to bed
Lotta vestiges in there still, sorry
Yep, that looks good to me as well. (Though I’m not sure what
LangToken
is for… I assume that’s one of the ‘vestiges’.)yeah, yep.
I just implemented
expr
after seeing your advice