[Python-Dev] re performance

Wang, Peter Xihong peter.xihong.wang at intel.com
Tue Jan 31 14:40:48 EST 2017


Regarding to the performance difference between "re" and "regex" and packaging related options, we did a performance comparison using Python 3.6.0 to run some micro-benchmarks in the Python Benchmark Suite (https://github.com/python/performance):

Results in ms, and the lower the better (running on Ubuntu 15.10)
				re		regex (via pip install regex, and a replacement of "import re" with "import regex as re")
bm_regex_compile.py		229		298
bm_regex_dna.py		171		267
bm_regex_effbot.py		2.77		3.04
bm_regex_v8.py		24.8		14.1
This data shows "re" is better than "regex" in term of performance in 3 out of 4 above micro-benchmarks.

Anyone searching for "regular expression python" will get a first hit at the Python documentation on "re".  Naturally, any new developer could start with "re" since day 1 and not bother to look elsewhere for alternatives later on.

We did a query for "import re" against the big cloud computing software application, OpenStack (with 3.7 million lines of source codes and majority of them written in Python), and got ~1000 hits.

With that being said, IMHO, it would be nice to capture ("borrow") the performance benefit from "regex" and merged into "re", without knowing or worrying about packaging/installing stuff.

Cheers,

Peter

 

-----Original Message-----
From: Python-Dev [mailto:python-dev-bounces+peter.xihong.wang=intel.com at python.org] On Behalf Of Nick Coghlan
Sent: Tuesday, January 31, 2017 1:54 AM
To: Barry Warsaw <barry at python.org>
Cc: python-dev at python.org
Subject: Re: [Python-Dev] re performance

On 30 January 2017 at 15:26, Barry Warsaw <barry at python.org> wrote:
> On Jan 30, 2017, at 12:38 PM, Nick Coghlan wrote:
>
>>I think there are 3 main candidates that could fit that bill:
>>
>>- requests
>>- setuptools
>>- regex
>
> Actually, I think pkg_resources would make an excellent candidate.  
> The setuptools crew is working on a branch that would allow for 
> setuptools and pkg_resources to be split, which would be great for 
> other reasons.  Splitting them may mean that pkg_resources could 
> eventually be added to the stdlib, but as an intermediate step, it 
> could also test out this idea.  It probably has a lot less of the baggage that you outline.

Yep, if/when pkg_resources is successfully split out from the rest of setuptools, I agree it would also be a good candidate for stdlib bundling - version independent runtime access to the database of installed packages is a key capability for many use cases, and not currently something we support especially well.

It's also far more analogous to the existing pip bundling, since setuptools/pkg_resources are also maintained under the PyPA structure.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/peter.xihong.wang%40intel.com


More information about the Python-Dev mailing list