[Python-Dev] Should we move to replace re with regex?

Sat Aug 27 10:02:49 CEST 2011

On Sat, Aug 27, 2011 at 4:01 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> You're talking technically, which is important, but wasn't what I was
> suggesting would be helped.
>
> Politically, and from a marketing standpoint, it's easier to withdraw a
> feature you've given with a "Play with this, see if it works for you"
> warning.

The standard library isn't for playing. "pip install regex" is for
playing. If we aren't sure we want to make the transition, then it
doesn't go in.

However, to my mind, reviewing and incorporating regex is a far more
feasible model than trying to enhance the existing re module with a
comparable feature set. At the moment, there's already an obvious way
to get enhanced regex support in Python: install regex and use it
instead of the standard library's re module. That's enough to pretty
much kill any motivation anyone might have to make major changes to re
itself.

We're at least getting one thing right this time that we got wrong
with multiprocessing, though - we're much, much further out from the
3.3 release than we were from the 2.6 release when multiprocessing was
added to the standard library :)

The next step needed is for someone to volunteer to write and champion
a PEP that:
- articulates the deficiencies in the current re module (the regex
docs already cover some of this, as do Tom Christiansen's notes on the
issue tracker)
- explains why upgrading re in place is not feasible (e.g. noting that
the availability of regex really limits the desire for anyone to
reinvent that particular wheel, so even things that are theoretically
possible may be highly unlikely in practice)
- proposes a transition plan (personally, I'd be fine with an optparse
-> argparse style transition where re remains around indefinitely to
support legacy code, but new users are pointed towards regex. But
depending on compatibility details, merging the two APIs in the
existing re namespace may also be feasible)
- proposes a maintenance strategy (I don't know how much Matthew has
written regarding internal design details, but that kind of thing
could really help. Matthew agreeing to continue maintenance as part of
the standard library would also help a great deal, but wouldn't be
enough on its own - while it's good for modules to have active
maintainers to make the final call associated design decisions, it's
potentially problematic when other core developers don't understand
what the code is doing well enough to fix bugs in it)
- confirms that the regex test suite can be incorporated cleanly into
the standard library regression test suite (the difficulty of this was
something that was underestimated for the inclusion of
multiprocessing. Test suite integration is also the final sticking
point holding up the PEP 380 'yield from' patch, although that's close
to being resolved following the PyConAU sprints)
- document tests conducted (e.g. micro-benchmark results, fusil results)

PEP 371 (addition of multiprocessing), PEP 389 (addition of argparse)
and Jesse's reflections on the way multiprocessing was added
(http://jessenoller.com/2009/01/28/multiprocessing-in-hindsight/) are
well worth reading for anyone considering stepping up to write a PEP.
That last also highlights why even Matthew's support, however capably
he has handled maintenance of regex as an independent project,
wouldn't be enough - we had Richard Oudkerk's support and agreement to
continue maintenance as the original author of multiprocessing, but he
became unavailable early in the integration process. If Jesse hadn't
been able to take up most of that slack, the likely result would have
been reversion of the changes and removal of multiprocessing from the
2.6 release.

Writing PEPs can be quite a frustrating experience (since a lot of
feedback will be negative as people try to poke holes in the idea to
see if it stands up to close scrutiny), but it's also really
satisfying and rewarding if they end up getting accepted and
incorporated :)

>> Have then been any __future__ features that were added provisionally?
>
> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent.  Irrespective, it's hard to import something from
> "future" without at least suspecting that you're on the bleeding edge.

No, we make an explicit guarantee that future imports will never go
away once they've been added. They may become redundant, but they
won't break. There's no provision in the future mechanism for changes
that are added and then later removed (see
http://docs.python.org/dev/library/__future__).

They're strictly for cases where backwards incompatibilities (usually,
but not always, new keywords) may break existing code.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia