Implementing a Maximal-Munch Lexical Analyzer
Extending the NFA generator Previously we implemented a regular expression matching algorithm that generated an NFA from an input regular expression, and simulated a string over the NFA to check for acceptance. The regular expression was required to be in postfix form and the algorithm only supported union $|$ , concatenation $.$ , Kleene star $*$ , one-or-more $+$ , and zero-or-one $?$ operations. While this was sufficient for simple applications, building a lexer would require support for character ranges (for example $[0\text{-}9]$, $[a\text{-}z]$) and escape characters (for example one may wish to use $*$ for the multiply operator rather than Kleene star)....