Lexing in Python 2

Sun Jan 23 22:55:26 EST 2000

[Paul Prescod]
> At the last XML conference I told someone that the reason
> that re doesn't take a stream instead of string parameter
> was because anyone sane working on a large file would use
> a proper tokenizer. Shouldn't such a tokenizer come with
> Python? With all due respect, what the hell is shlex and
> how did it get into the standard distribution?

I've wondered that myself <0.9 wink>.

> I mean the standard distribution alone must contain half a
> dozen hand-coded lexers and in a few places, the weirdness
> you need to apply regular expressions to streams. Surely we
> can do better for Python 2?

If nobody was motivated enough to write the code for Python 1, I don't know
why that would change for Python 3000 (that's what Guido insists on calling
it now <wink>).  If you want a *fast* Python lexer today, mxTextTools is
your best hope.

> It is my unconsidered, uneducated opinion that lexers do not
> vary as widely as parsers (LL(1), LR(1), LR(N) etc.) so we
> could just choose one at random and start building modules
> around it.

Curiously, mxTextTools is nothing like lex/flex.  Flex does such a good job
it's hard to get motivated to duplicate all that effort (it's not easy)
solely to get something releasable under a more Python-like license.  I
don't know how Marc-Andre would feel about folding mxTextTools into the
distribution.

> All in favor? Opposed? Carried.

I'm almost never opposed to someone else doing work <wink>.

BTW, there are several interesting parsing projects going on in the Java
world; at least JPython should be able to exploit them.

IDLE-adds-at-least-two-more-hand-crafted-lexers-ly y'rs  - tim