[Python-Dev] Should we move to replace re with regex?

Guido van Rossum guido at python.org
Sat Aug 27 00:18:35 CEST 2011


On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Guido van Rossum wrote:
>> I just made a pass of all the Unicode-related bugs filed by Tom
>> Christiansen, and found that in several, the response was "this is
>> fixed in the regex module [by Matthew Barnett]". I started replying
>> that I thought that we should fix the bugs in the re module (i.e.,
>> really in _sre.c) but on second thought I wonder if maybe regex is
>> mature enough to replace re in Python 3.3. It would mean that we won't
>> fix any of these bugs in earlier Python versions, but I could live
>> with that.
>>
>> However, I don't know much about regex -- how compatible is it, how
>> fast is it (including extreme cases where the backtracking goes
>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>> to actually incorporate it into CPython as a complete drop-in
>> replacement of the re package (such that nobody needs to change their
>> imports or the flags they pass to the re module).
>>
>> We'd also probably have to train some core developers to be familiar
>> enough with the code to maintain and evolve it -- I assume we can't
>> just volunteer Matthew to do so forever... :-)
>>
>> What's the alternative? Is adding the requested bug fixes and new
>> features to _sre.c really that hard?
>
> Why not simply add the new lib, see whether it works out and
> then decide which path to follow.
>
> We've done that with the old regex lib. It took a few years
> and releases to have people port their applications to the
> then new re module and syntax, but in the end it worked.
>
> With a new regex library there are likely going to be quite
> a few subtle differences between re and regex - even if it's
> just doing things in a more Unicode compatible way.
>
> I don't think anyone can actually list all the differences given
> the complex nature of regular expressions, so people will
> likely need a few years and releases to get used it before
> a switch can be made.

I can't say I liked how that transition was handled last time around.
I really don't want to have to tell people "Oh, that bug is fixed but
you have to use regex instead of re" and then a few years later have
to tell them "Oh, we're deprecating regex, you should just use re".

I'm really hoping someone has more actual technical understanding of
re vs. regex and can give us some facts about the differences, rather
than, frankly, FUD.

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list