[Python-Dev] Re: Automatic flex interface for Python?
Guido van Rossum
guido@python.org
Tue, 20 Aug 2002 23:57:51 -0400
> Lexers are painful in Python. They hit the language in a weak spot
> created by the immutability of strings. I've found this an obstacle
> more than once, but then I'm a battle-scarred old compiler jock who
> attacks *everything* with lexers and parsers.
I think you're exaggerating the problem, or at least underestimating
the re module. The re module is pretty fast! Reading a file
line-by-line is very fast in Python 2.3 with the new "for line in
open(filename)" idiom. I just scanned nearly a megabyte of ugly data
(a Linux kernel) in 0.6 seconds using the regex '\w+', finding 177,000
words. The regex (?:\d+|[a-zA-Z_]+) took 1 second, yielding 1 second,
finding 190,000 words. I expect that the list creation (one hit at a
time) took more time than the matching.
--Guido van Rossum (home page: http://www.python.org/~guido/)