[Python-Dev] Make re.compile faster

Stefan Behnel stefan_ml at behnel.de
Tue Oct 3 11:13:35 EDT 2017


INADA Naoki schrieb am 03.10.2017 um 05:29:
> Before deferring re.compile, can we make it faster?

I tried cythonizing both sre_compile.py and sre_parse.py, which gave me a
speedup of a bit more than 2x. There is definitely space left for further
improvements since I didn't know much about the code, and also didn't dig
very deeply. I used this benchmark to get uncached patterns:

    [re_compile("[a-z]{%d}[0-9]+[0-9a-z]*[%d-9]" % (i, i%8))
     for i in range(20000)]

Time for Python master version:
2.14 seconds
Time for Cython compiled version:
1.05 seconds

I used the latest Cython master for it, as I had to make a couple of type
inference improvements for bytearray objects along the way.

Cython's master branch is here:
https://github.com/cython/cython

My CPython changes are here:
https://github.com/scoder/cpython/compare/master...scoder:cythonized_sre_compile

They are mostly just external type declarations and a tiny type inference
helper fix. I could have used the more maintainable PEP-484 annotations for
local variables right in the .py files, but AFAIK, those are still not
wanted in the standard library. And they also won't suffice for switching
to extension types in sre_parse.

Together with the integer flag changes, that could give a pretty noticible
improvement overall.

Stefan



More information about the Python-Dev mailing list