[Python-Dev] New regex module for 3.2?

Nick Coghlan ncoghlan at gmail.com
Sun Jul 11 14:19:10 CEST 2010


On Sun, Jul 11, 2010 at 7:19 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> On Fri, Jul 9, 2010 at 6:59 PM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
>>
>> While the re2 comparison might be interesting from an abstract
>> standpoint, it intentionally supports a different regex language from
>> Python so that it can run faster and use less memory. Since re2 can
>> never replace Python's re module, it doesn't make sense to hold MRAB's
>> new module to that standard.
>
> re2 comparison is interesting from the point of if it should be
> included in stdlib.

No it isn't. re2 is a wrapper for a third party engine that surrenders
some functionality in the pursuit of better bounds on memory and CPU
usage. It is not a drop-in replacement for re and cannot be by design:
"The one significant exception is that RE2 drops support for
backreferences and generalized zero-width assertions, because they
cannot be implemented efficiently." There is no reason to have two
distinct regex engines in the standard library - if someone knows
enough to realise they need the performance assurances of re2, they're
also likely to be able to find the Python wrappers for it.

regex is potentially interesting for the standard library as it *is*
intended to be a drop-in replacement for re that trades longer
compilation times (typically once per application) for faster match
times (potentially many times per application). The performance of re2
has nothing to do with the comparison between the current re module
and MRAB's regex module.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list