[issue8064] Large regex handling very slow on Linux
Ezio Melotti
report at bugs.python.org
Fri Mar 5 03:14:32 CET 2010
Ezio Melotti <ezio.melotti at gmail.com> added the comment:
This is a proof that you can have an equivalent regex without including all the 'letter chars' (tested on both narrow and wide builds):
>>> s = u''.join(unichr(c) for c in range(sys.maxunicode))
>>> diff = set(re.findall(u'[^\W\d]', s, re.U)) ^ set(re.findall(u'[%s_-]' % makew(), s, re.U))
>>> diff.remove('-')
>>> re.findall(u'(?:[^\W\d%s]|-)' % ''.join(diff), s, re.U) == re.findall(u'[%s_-]' % makew(), s, re.U)
True
(I don't like the way I included the '-' but I couldn't find anything better.)
It looks however that most of the time is spent during the findall and from a quick benchmark it seems that my regex is slower (even if it's shorter and it compiles faster).
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8064>
_______________________________________
More information about the Python-bugs-list
mailing list