ReadP lexeme combinator - Haskell

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

James King

I'm trying to write a tokenizer that has a lexeme combinator which would allow tokenizer to scan for the next token ignoring whitespace. This worked fine when "whitespace" was literally space characters and lexeme was defined like so:

lexeme :: ReadP a -> ReadP a
lexeme p = skipSpaces >> p

But now I want to make my tokenizer treat comments as whitespace as well. Comments take the form of /** block comments */ and // line comments. If I define it like so:

whitespace' :: ReadP ()
whitespace' = do
  skipSpaces
  skipMany $ choice [ blockComment, lineComment ]
  where
    blockComment = string "/**" *> many isChar <* string "*/"
    lineComment = string "//" *> manyTill isChar eol
    isChar = satisfy isAscii
    eol = choice [string "\r\n", string "\n"]

lexeme :: ReadP a -> ReadP a
lexeme p = whitespace' >> p

I get a failure if I try to parse anything that has a comment in it.

Paolo Capriotti

Probably because many isChar reads until the end. You want something there which is able to stop when it finds the end comment marker.

James King

I think I've narrowed down the issue to overlapping parsers. :thinking: I have some parser p that matches on the char / and the string "/**" matching in the whitespace parser. :thinking:

James King

readP_to_S will return multiple results in this case and my code is only inspecting the head of the list and setting the stream state with the unconsumed input. :thinking: