[Python-Dev] re performance
Wang, Peter Xihong
peter.xihong.wang at intel.com
Tue Jan 31 14:40:48 EST 2017
Regarding to the performance difference between "re" and "regex" and packaging related options, we did a performance comparison using Python 3.6.0 to run some micro-benchmarks in the Python Benchmark Suite (https://github.com/python/performance):
Results in ms, and the lower the better (running on Ubuntu 15.10)
re regex (via pip install regex, and a replacement of "import re" with "import regex as re")
bm_regex_compile.py 229 298
bm_regex_dna.py 171 267
bm_regex_effbot.py 2.77 3.04
bm_regex_v8.py 24.8 14.1
This data shows "re" is better than "regex" in term of performance in 3 out of 4 above micro-benchmarks.
Anyone searching for "regular expression python" will get a first hit at the Python documentation on "re". Naturally, any new developer could start with "re" since day 1 and not bother to look elsewhere for alternatives later on.
We did a query for "import re" against the big cloud computing software application, OpenStack (with 3.7 million lines of source codes and majority of them written in Python), and got ~1000 hits.
With that being said, IMHO, it would be nice to capture ("borrow") the performance benefit from "regex" and merged into "re", without knowing or worrying about packaging/installing stuff.
Cheers,
Peter
-----Original Message-----
From: Python-Dev [mailto:python-dev-bounces+peter.xihong.wang=intel.com at python.org] On Behalf Of Nick Coghlan
Sent: Tuesday, January 31, 2017 1:54 AM
To: Barry Warsaw <barry at python.org>
Cc: python-dev at python.org
Subject: Re: [Python-Dev] re performance
On 30 January 2017 at 15:26, Barry Warsaw <barry at python.org> wrote:
> On Jan 30, 2017, at 12:38 PM, Nick Coghlan wrote:
>
>>I think there are 3 main candidates that could fit that bill:
>>
>>- requests
>>- setuptools
>>- regex
>
> Actually, I think pkg_resources would make an excellent candidate.
> The setuptools crew is working on a branch that would allow for
> setuptools and pkg_resources to be split, which would be great for
> other reasons. Splitting them may mean that pkg_resources could
> eventually be added to the stdlib, but as an intermediate step, it
> could also test out this idea. It probably has a lot less of the baggage that you outline.
Yep, if/when pkg_resources is successfully split out from the rest of setuptools, I agree it would also be a good candidate for stdlib bundling - version independent runtime access to the database of installed packages is a key capability for many use cases, and not currently something we support especially well.
It's also far more analogous to the existing pip bundling, since setuptools/pkg_resources are also maintained under the PyPA structure.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/peter.xihong.wang%40intel.com
More information about the Python-Dev
mailing list