[Python-Dev] Re: Automatic flex interface for Python?

Tim Peters tim.one@comcast.net
Wed, 21 Aug 2002 21:21:19 -0400


[Gordon McMillan]
> mxTextTools lets (encourages?) you to break all
> the rules about lex -> parse. If you can (& want to)
> put a good deal of the "parse" stuff into the scanning
> rules, you can get a speed advantage. You're also
> not constrained by the rules of BNF, if you choose
> to see that as an advantage :-).
>
> My one successful use of mxTextTools came after
> using SPARK to figure out what I actually needed
> in my AST, and realizing that the ambiguities in the
> grammar didn't matter in practice, so I could produce
> an almost-AST directly.

I don't expect anyone will have much luck writing a fast lexer using
mxTextTools *or* Python's regexp package unless they know quite a bit about
how each works under the covers, and about how fast lexing is accomplished
by DFAs.  If you know both, you can build a DFA by hand and painfully
instruct mxTextTools in the details of its construction, and get a very fast
tokenizer (compared to what's possible with re), regardless of the number of
token classes or the complexity of their definitions.  Writing to
mxTextTools directly is a lot like writing in an assembly language for a
character-matching machine, with all the pains and potential joys that
implies.  If I were Eric, I'd use Flex <wink>.