
[Gordon McMillan]
mxTextTools lets (encourages?) you to break all the rules about lex -> parse. If you can (& want to) put a good deal of the "parse" stuff into the scanning rules, you can get a speed advantage. You're also not constrained by the rules of BNF, if you choose to see that as an advantage :-).
My one successful use of mxTextTools came after using SPARK to figure out what I actually needed in my AST, and realizing that the ambiguities in the grammar didn't matter in practice, so I could produce an almost-AST directly.
I don't expect anyone will have much luck writing a fast lexer using mxTextTools *or* Python's regexp package unless they know quite a bit about how each works under the covers, and about how fast lexing is accomplished by DFAs. If you know both, you can build a DFA by hand and painfully instruct mxTextTools in the details of its construction, and get a very fast tokenizer (compared to what's possible with re), regardless of the number of token classes or the complexity of their definitions. Writing to mxTextTools directly is a lot like writing in an assembly language for a character-matching machine, with all the pains and potential joys that implies. If I were Eric, I'd use Flex <wink>.