Wanted: slow regexes
Tim Chase
python.list at tim.thechases.com
Mon Dec 6 07:53:46 EST 2010
On 12/05/2010 10:08 PM, MRAB wrote:
> I'm looking for examples of regexes which are slow (especially those
> which seem never to finish) but whose results are known. I already have
> those reported in the bug tracker, but further ones will be welcome.
>
> This is for testing additional modifications to the new regex
> implementation (available on PyPI).
There was a DOS security issue in Django about a year back (fixed
the day it came to light in changeset 11603), triggered by a
regexp with a lot of back-tracking:
http://code.djangoproject.com/changeset/11603
which tried to match
email_re = re.compile(
r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*"
# dot-atom
r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-011\013\014\016-\177])*"'
# quoted-string
r')@(?:[A-Z0-9]+(?:-*[A-Z0-9]+)*\.)+[A-Z]{2,6}$',
re.IGNORECASE) # domain
against
'viewx3dtextx26qx3d at yahoo.comx26latlngx3d15854521645943074058'
(should return None rather than a MatchObject).
Folks were reporting that it was taking >20min to run.
-tkc
More information about the Python-list
mailing list