Haskell Alex - regex matches wrong string? -


i'm trying write lexer indentation-based grammar , i'm having trouble matching indentation.

here's code:

{ module lexer ( main )  import system.io.unsafe }   %wrapper "monaduserstate"  $whitespace = [\ \t\b] $digit      = 0-9                                            -- digits $alpha      = [a-za-z] $letter     = [a-za-z]                                       -- alphabetic characters $ident      = [$letter $digit _]                             -- identifier character $indent     = [\ \t]  @number     = [$digit]+ @identifier = $alpha($alpha|_|$digit)*  error:-  @identifier { mkl lvarid }  \n $whitespace* \n { skip } \n $whitespace*    { setindent } $whitespace+       { skip }  {  data lexeme = lexeme alexposn lexemeclass (maybe string)  instance show lexeme     show (lexeme _ leof _)   = "  lexeme eof"     show (lexeme p cl  mbs) = " lexeme class=" ++ show cl ++ showap p ++ showst mbs               showap pp = " posn=" ++ showposn pp         showst nothing  = ""         showst (just s) = " string=" ++ show s  instance eq lexeme     (lexeme _ cls1 _) == (lexeme _ cls2 _) = cls1 == cls2  showposn :: alexposn -> string showposn (alexpn _ line col) = show line ++ ':': show col  tokposn :: lexeme -> alexposn tokposn (lexeme p _ _) = p  data lexemeclass     = lvarid     | ltindent int     | ltdedent int     | lindent     | ldedent     | leof     deriving (show, eq)  mkl :: lexemeclass -> alexinput -> int -> alex lexeme mkl c (p, _, _, str) len = return (lexeme p c (just (take len str)))  data alexuserstate = alexuserstate { indent :: int }  alexinituserstate :: alexuserstate alexinituserstate = alexuserstate 0  type action = alexinput -> int -> alex lexeme  getlexerindentlevel :: alex int getlexerindentlevel = alex $ \s@alexstate{alex_ust=ust} -> right (s, indent ust)  setlexerindentlevel :: int -> alex () setlexerindentlevel = alex $ \s@alexstate{alex_ust=ust} -> right (s{alex_ust=(alexuserstate i)}, ())  setindent :: action setindent input@(p, _, _, str) =     --let !x = unsafeperformio $ putstrln $ "|matched string: " ++ str ++ "|"     lastindent <- getlexerindentlevel     currindent <- countindent (drop 1 str) 0 -- first char \n     if (lastindent < currindent)         setlexerindentlevel currindent            mkl (ltindent (currindent - lastindent)) input     else if (lastindent > currindent)         setlexerindentlevel currindent            mkl (ltdedent (lastindent - currindent)) input     else alexmonadscan       countindent str total         | take 1 str == "\t" = skip input 1                                   countindent (drop 1 str) (total+1)         | take 4 str == "    " = skip input 4                                     countindent (drop 4 str) (total+1)         | otherwise = return total  alexeof :: alex lexeme alexeof = return (lexeme undefined leof nothing)  scanner :: string -> either string [lexeme] scanner str =     let loop =         tok@(lexeme _ cl _) <- alexmonadscan         if (cl == leof)             return [tok]             else toks <- loop                     return (tok:toks)     in runalex str loop  addindentations :: [lexeme] -> [lexeme] addindentations (lex@(lexeme pos (ltindent c) _):ls) =     concat [iter lex c, addindentations ls]   iter lex c = if c == 0 []                      else (lexeme pos lindent nothing):(iter lex (c-1)) addindentations (lex@(lexeme pos (ltdedent c) _):ls) =     concat [iter lex c, addindentations ls]   iter lex c = if c == 0 []                      else (lexeme pos ldedent nothing):(iter lex (c-1)) addindentations (l:ls) = l:(addindentations ls) addindentations [] = []   main =     s <- getcontents     return ()     print $ fmap addindentations (scanner s)  } 

problem in line \n $whitespace* { setindent }, regex matches wrong string , calls setindent wrong string. debugging purposes, added unsafeperformio in setindent function, here's example run of program:

begin                first indent |matched string:          first indent                 second indent                 second indent dedent dedent | |matched string:                  second indent dedent | |matched string:  dedent | |matched string:  | right [ lexeme class=lvarid posn=1:1 string="begin", lexeme class=lindent posn=1:6, lexeme class=lvarid posn=2:15 string="indent", lexeme class=lindent posn=2:21, lexeme class=ldedent posn=3:30, lexeme class=ldedent posn=3:30, lexeme class=lvarid posn=4:1 string="dedent",  lexeme eof] 

so setindent called more whitespaces. , after returns lexeme indentation, other part of string omitted.

is bug in alex? or doing wrong?

so haven't analysed code in detail, did notice this:

setindent :: action setindent input@(p, _, _, str) =     --let !x = unsafeperformio $ putstrln $ "|matched string: " ++ str ++ "|" 

note str rest of input, not current token. current token, want take str. perhaps giving impression token matching more of input is.

we handle indentation in ghc's own lexer of course, might want look there ideas (although might expect it's rather large , complicated).


Comments

Popular posts from this blog

django - How can I change user group without delete record -

java - Need to add SOAP security token -

java - EclipseLink JPA Object is not a known entity type -