[Python-Dev] Re: Automatic flex interface for Python?

M.-A. Lemburg mal@lemburg.com
Mon, 16 Sep 2002 11:10:05 +0200

Tim Peters wrote:
> [Gordon McMillan]
>>mxTextTools lets (encourages?) you to break all
>>the rules about lex -> parse. If you can (& want to)
>>put a good deal of the "parse" stuff into the scanning
>>rules, you can get a speed advantage. You're also
>>not constrained by the rules of BNF, if you choose
>>to see that as an advantage :-).
>>My one successful use of mxTextTools came after
>>using SPARK to figure out what I actually needed
>>in my AST, and realizing that the ambiguities in the
>>grammar didn't matter in practice, so I could produce
>>an almost-AST directly.
> I don't expect anyone will have much luck writing a fast lexer using
> mxTextTools *or* Python's regexp package unless they know quite a bit about
> how each works under the covers, and about how fast lexing is accomplished
> by DFAs.  If you know both, you can build a DFA by hand and painfully
> instruct mxTextTools in the details of its construction, and get a very fast
> tokenizer (compared to what's possible with re), regardless of the number of
> token classes or the complexity of their definitions.  Writing to
> mxTextTools directly is a lot like writing in an assembly language for a
> character-matching machine, with all the pains and potential joys that
> implies.  If I were Eric, I'd use Flex <wink>.

FYI, there are a few meta languages to make life easier for
mxTextTools like e.g. Mike Fletcher's SimpleParse.

The upcoming version 2.1 will also support Unicode and allows
text jump targets which boosts readability of the tag tables a
lot and makes hand-writing the tables much easier.

The beta of 2.1 is available to the subscribers of the egenix-users
mailing list.

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/