[Python-Dev] Yet more SRE

Gustavo Niemeyer niemeyer@conectiva.com
Thu, 26 Jun 2003 14:04:56 -0300


I've just submitted to SF #757624 the third implementation of the
SRE changes.

Unlike the other implementations, this one doesn't recurse at all, and
doesn't change the meaning of the opcodes in the engine. Indeed, the
original logic is mostly unchanged, and the recursion has been removed
by a code reorganization. Most of the magic is done using "jumps", and
"context saving" inside SRE_MATCH. It's even possible to reenable the
recursive scheme by enabling the USE_RECURSION macro.

Besides the regression tests, I have tested this using Fredrik's
suggestion, with a 4MB XML file and xmllib/tokenize modules. There
was a very small slow down (i.e. about 37.4s in old implementation
vs. 38.0s in new implementation).

Bugs previously mentioned are also fixed in this implementation.

This implementation also includes the protection against zero-width
match in greedy expressions ("(?=a)*", "a"). Even if this implementation
doesn't get into 2.3, I was thinking about backporting this specific fix
to 2.3. Unfortunately, this would need an additional local variable in
SRE_MATCH of the current implementation, and I'm afraid to reduce the
recursion limit even more by introducing it.

Additionally, after a quick look over perl's regex engine, I also have
some ideas for matching optimization in SRE, but I'll wait a little bit
before fiddling with it.

-- 
Gustavo Niemeyer
http://niemeyer.net