Python RegExp

Jeff Epler jepler at unpythonic.net
Wed Mar 23 00:52:22 CET 2005


On my machine the program finishes in 30 seconds. (it's a 1.5GHz machine)
If the 'parm' group is removed, or if the buffer is shortened, the time
is reduced considerably.

There are "pathological cases" for regular expressions which can take
quite a long time.  In the case of your expression, it's happening for
the group 'parm'.  I think, but don't know, that each time a candidate
for 'parm' is found, the following '#' (or maybe the second '<'?) is not
found, and it backtracks to try to match 'parm' in a different way,
which involves considering many different combinations (basically, each
'name=' could be the start of a new instance of the first parenthsized
subgroup of <parm>, or it could be part of the character class that
includes [a-zA-Z=])

You may wish to consider using other approaches for parsing this text
than regular expressions.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050322/78235fc6/attachment.pgp>


More information about the Python-list mailing list