Re: [Python-ideas] Complicate str methods

On Thu, Feb 8, 2018 at 11:24 AM, Steve Dower <steve.dower@python.org> wrote:
Easily fixed by installing one of the alternate regex libraries.
MRAB's regex library, the most prominent alternative, does not use the linear-time search algorithm. The only libraries I know that do are the ones with re2, though I haven't looked deeply. Let it be known that I tried to install pyre2 (re2 on PyPI) on Ubuntu For Windows for the tests, and after hours of no success, I decided that the alternative libraries were not the point. I eventually got it working (https://github.com/axiak/pyre2/issues/51), and here are the results: # Same setup as before, with findsub_re2 using `find=re2.search, esc=re2.escape`. pattern2 = re2.compile('|'.join(map(re.escape, needles))) %timeit findsub(haystack, needles) #=> 1.26 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit findsub_re(haystack, needles) #=> 745 ms ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit findsub_re_cached(haystack, pattern) #=> 733 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit findsub_regex(haystack, needles) #=> 639 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit findsub_re2(haystack, needles) #=> 34.1 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit findsub_re_cached(haystack, pattern2) #=> 9.56 µs ± 268 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In any case, installing re2 is even more of an advanced solution for a basic problem than using re.
re performance and its edge cases have been discussed endlessly. Please look it up before restarting that discussion.
I'm not trying to restart the discussion. I'm trying to say that the assumptions being made about its superior performance are unfounded. Two members have suggested that re is the performant option, and that's just not true.
participants (1)
-
Franklin? Lee