I'm trying to write a tokenizer that has a lexeme combinator which would allow tokenizer to scan for the next token ignoring whitespace. This worked fine when "whitespace" was literally space characters and lexeme was defined like so:
lexeme::ReadPa->ReadPalexemep=skipSpaces>>p
But now I want to make my tokenizer treat comments as whitespace as well. Comments take the form of /** block comments */ and // line comments. If I define it like so:
I think I've narrowed down the issue to overlapping parsers. :thinking: I have some parser p that matches on the char / and the string "/**" matching in the whitespace parser. :thinking:
readP_to_S will return multiple results in this case and my code is only inspecting the head of the list and setting the stream state with the unconsumed input. :thinking:
I'm trying to write a tokenizer that has a
lexeme
combinator which would allow tokenizer to scan for the next token ignoring whitespace. This worked fine when "whitespace" was literally space characters andlexeme
was defined like so:But now I want to make my tokenizer treat comments as whitespace as well. Comments take the form of
/** block comments */
and// line comments
. If I define it like so:I get a failure if I try to parse anything that has a comment in it.
Probably because
many isChar
reads until the end. You want something there which is able to stop when it finds the end comment marker.You could also try to take inspiration from how megaparsec handles comments - https://hackage.haskell.org/package/megaparsec-9.2.0/docs/Text-Megaparsec-Char-Lexer.html
I think I've narrowed down the issue to overlapping parsers. :thinking: I have some parser
p
that matches on the char/
and thestring "/**"
matching in the whitespace parser. :thinking:readP_to_S
will return multiple results in this case and my code is only inspecting thehead
of the list and setting the stream state with the unconsumed input. :thinking: