Regular expression performance

Andrew Kuchling akuchlin at mems-exchange.org
Fri Dec 1 15:53:57 EST 2000


p.g at figu.no (Per Gummedal) writes:
> Now I see that Fredrik was right that sre has problems with IGNORECASE.
> and that my conclusion, that sre has problems with long strings was wrong.
> Instead alternation (p|l) at the beginning of the pattern makes sre very 
> slow. (a bug ?)

I looked at the SRE code too quickly; it seems to do a faster search
when the pattern starts with a literal character, but doesn't actually
determine a set of possible first characters.  So mos95 is fast, but
(p|l) is not because SRE ends up checking every single character.  pre
would determine that any match must start with 'p' or 'l' and then
search for those two characters.

Adding this would therefore require adding a significant optimization
to SRE's engine.  Make a fun 2.1 project...

--amk




More information about the Python-list mailing list