[Python-Dev] New regex module for 3.2?

MRAB python at mrabarnett.plus.com
Fri Jul 9 21:35:16 CEST 2010


Collin Winter wrote:
> On Fri, Jul 9, 2010 at 10:28 AM, MRAB <python at mrabarnett.plus.com> wrote:
>> anatoly techtonik wrote:
>>> On Thu, Jul 8, 2010 at 10:52 PM, MRAB <python at mrabarnett.plus.com> wrote:
>>>> Hi all,
>>>>
>>>> I re-implemented the re module, adding new features and speed
>>>> improvements. It's available at:
>>>>
>>>>   http://pypi.python.org/pypi/regex
>>>>
>>>> under the name "regex" so that it can be tried alongside "re".
>>>>
>>>> I'd be interested in any comments or feedback. How does it compare with
>>>> "re" in terms of speed on real-world data? The benchmarks suggest it
>>>> should be faster, or at worst comparable.
>>> And where are the benchmarks?
>>> In particular it would be interesting to see it compared both to re
>>> from stdlib and re2 from  http://code.google.com/p/re2/
>>>
>> The benchmarks bm_regex_effbot.py and bm_regex_v8.py both perform
>> multiple runs of the tests multiple times, giving just the total times
>> for each set. Here are the averages:
>>
>> Python26
>> BENCHMARK        re         regex      ratio
>> bm_regex_effbot  0.135secs  0.083secs  1.63
>> bm_regex_v8      0.153secs  0.085secs  1.80
>>
>>
>> Python31
>> BENCHMARK        re         regex      ratio
>> bm_regex_effbot  0.138secs  0.083secs  1.66
>> bm_regex_v8      0.170secs  0.091secs  1.87
> 
> Out of curiosity, what are the results for the bm_regex_compile benchmark?
> 
I concentrated my efforts on the matching speed because regexes tend to
be compiled only once, and are cached anyway, so I don't think it's as
important. The results are:

Python26
BENCHMARK         re         regex      ratio
bm_regex_compile  0.897secs  2.792secs  0.32


Python31
BENCHMARK         re         regex      ratio
bm_regex_compile  0.902secs  2.731secs  0.33

If anyone can demonstrate that it'll have a significant impact in
practice then I will, of course, look into it further.


More information about the Python-Dev mailing list