Re: [Python-Dev] Re: Automatic flex interface for Python?

Aug. 21, 2002


      ...
Lexers are painful in Python.  They hit the language in a weak spot
created by the immutability of strings.  I've found this an obstacle
more than once, but then I'm a battle-scarred old compiler jock who
attacks *everything* with lexers and parsers.
I think you're exaggerating the problem, or at least underestimating
the re module.  The re module is pretty fast!  Reading a file
line-by-line is very fast in Python 2.3 with the new "for line in
open(filename)" idiom.  I just scanned nearly a megabyte of ugly data
(a Linux kernel) in 0.6 seconds using the regex '\w+', finding 177,000
words.  The regex (?:\d+|[a-zA-Z_]+) took 1 second, yielding 1 second,
finding 190,000 words.  I expect that the list creation (one hit at a
time) took more time than the matching.

--Guido van Rossum (home page: http://www.python.org/~guido/)