[Python-ideas] Complicate str methods

Franklin? Lee leewangzhong+python at gmail.com
Thu Feb 8 12:13:26 EST 2018


On Thu, Feb 8, 2018 at 11:24 AM, Steve Dower <steve.dower at python.org> wrote:
> Easily fixed by installing one of the alternate regex libraries.

MRAB's regex library, the most prominent alternative, does not use the
linear-time search algorithm. The only libraries I know that do are
the ones with re2, though I haven't looked deeply.

Let it be known that I tried to install pyre2 (re2 on PyPI) on Ubuntu
For Windows for the tests, and after hours of no success, I decided
that the alternative libraries were not the point. I eventually got it
working (https://github.com/axiak/pyre2/issues/51), and here are the
results:

    # Same setup as before, with findsub_re2 using `find=re2.search,
esc=re2.escape`.

    pattern2 = re2.compile('|'.join(map(re.escape, needles)))

    %timeit findsub(haystack, needles)
    #=> 1.26 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    %timeit findsub_re(haystack, needles)
    #=> 745 ms ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit findsub_re_cached(haystack, pattern)
    #=> 733 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit findsub_regex(haystack, needles)
    #=> 639 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit findsub_re2(haystack, needles)
    #=> 34.1 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    %timeit findsub_re_cached(haystack, pattern2)
    #=> 9.56 µs ± 268 ns per loop (mean ± std. dev. of 7 runs, 100000
loops each)

In any case, installing re2 is even more of an advanced solution for a
basic problem than using re.

> re performance and its edge cases have been discussed endlessly. Please look
> it up before restarting that discussion.

I'm not trying to restart the discussion. I'm trying to say that the
assumptions being made about its superior performance are unfounded.
Two members have suggested that re is the performant option, and
that's just not true.


More information about the Python-ideas mailing list