[Python-Dev] what is happening with the regex module going into Python 3.3?

Sun Jun 3 23:38:49 CEST 2012

On Mon, Jun 4, 2012 at 6:25 AM, Gregory P. Smith <greg at krypto.org> wrote:
>
> On Fri, Jun 1, 2012 at 5:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> ipaddress really made it in because I personally ran into the limitations
>> of not having IP address support in the stdlib. I ended up doing quite a bit
>> of prompting to ensure the process of cleaning up the API to modern stdlib
>> standards didn't stall (even now, generating a module reference from the
>> docstrings is still a pending task)
>>
>> With regex, the pain isn't there, since re already covers such a large
>> subset of what regex provides.
>
> That last statement basically suggests that something like regex would never
> be accepted until a CPython core developer was actually running into pain
> with the many flaws in the re module (especially when it comes to Unicode).
>  I disagree with that.

No, that's not really what I meant. Driving integration of a module
takes *time* and *effort*. The decision to commit that effort has to
be driven by something, and personal annoyance is a great motivator.
In the case of PEP 3144, I happened to be in a position to do
something about a gap in the standard library after the omission was
made glaringly obvious [1].

Getting this done was a combined effort from Peter (in getting the
module API updated), myself and others (esp. Antoine) in reviewing the
reference implementation's API and requesting changes and more
recently Sandro Tosi has been doing most of the heavy lifting in
getting the docs up to scratch.

> Per the language summit, I think we need to just do it.  Put it in as re and
> rename the existing re module to sre.

No. We almost burned Jesse out dropping multiprocessing into 2.6 at
the last minute, and many longstanding issues with that module are
only being addressed now that Richard has the time to be involved
again. SRE already suffers from a lack of maintenance, and we've had
zero indication that regex will make that situation better (and
several indications that it will actually make it worse. Matthew's
silence on the topic is *not* encouraging, and nobody else has even
volunteered to write a PEP, let alone agree to maintain the module).

> We could pull the plug on it and leave it out if substantial as yet unknown
> problems that can't be fixed in time for release crop up during the beta 1
> or 2 (release manager's decision).

Unwinding changes to the build process is yet more work that may not
be needed. We need to remember the purpose of the standard library:
most of the time, it is *not* intended to be all things to all people.
The status quo is that, if you're doing basic, primarily ASCII,
regular expression processing, then "import re" will serve you just
fine. If you're doing more than that, then you'll probably need to do
"pip install regex" (or platform specific equivalent) and change your
import to "import regex as re".

That's not *great* (as the number of open Unicode bugs against SRE can
attest), but it's far from unworkable. I consider it preferable to
adding yet another big ball of C code to the stdlib in the absence of
a PEP addressing the concerns already raised.

>> My perspective is that it's now too late to make a change that big for
>> 3.3, but the in principle approval holds for anyone that wants to work with
>> MRAB and get the idea written up as a PEP for 3.4.
>
> Nonsense, as long as its in before 3.3 Beta 1 (scheduled for June 23rd
> according to PEP 398) it can go in.
>
> I don't like to claim that a PEP for this one is strictly necessary

Why not? Requiring a PEP is the norm, not the exception. Even when
there's agreement that something *should* be done, there's plenty of
details to be thrashed out in turning in principle agreement into a
concrete plan of action.

> but Nick
> raises good questions to be answered and has good suggestions for what to
> write up in the PEP in his earlier response that I certainly would prefer to
> have gathered up and documented so that is the route I suggest.
>
> The issue seems to be primarily one of "who is volunteering to do it?"

Correct, both in figuring out the integration details and in agreeing
to maintain it in the future.

Remember, now is better than never, but never is often better than
*right* now :)

Cheers,
Nick.

[1] http://git.fedorahosted.org/git/?p=pulpdist.git;a=blob;f=src/pulpdist/core/validation.py;h=ebccf354c5bbec376258681a345fb73129eeeb95;hb=736250d85b758a11e1d09f70ec3877d1c022aa9a#l77

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia