[Python-Dev] Should we move to replace re with regex?

Guido van Rossum guido at python.org
Fri Aug 26 23:45:17 CEST 2011


I just made a pass of all the Unicode-related bugs filed by Tom
Christiansen, and found that in several, the response was "this is
fixed in the regex module [by Matthew Barnett]". I started replying
that I thought that we should fix the bugs in the re module (i.e.,
really in _sre.c) but on second thought I wonder if maybe regex is
mature enough to replace re in Python 3.3. It would mean that we won't
fix any of these bugs in earlier Python versions, but I could live
with that.

However, I don't know much about regex -- how compatible is it, how
fast is it (including extreme cases where the backtracking goes
crazy), how bug-free is it, and so on. Plus, how much work would it be
to actually incorporate it into CPython as a complete drop-in
replacement of the re package (such that nobody needs to change their
imports or the flags they pass to the re module).

We'd also probably have to train some core developers to be familiar
enough with the code to maintain and evolve it -- I assume we can't
just volunteer Matthew to do so forever... :-)

What's the alternative? Is adding the requested bug fixes and new
features to _sre.c really that hard?

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list