On Thu, Feb 8, 2018 at 5:45 AM, Franklin? Lee
On Feb 7, 2018 17:28, "Serhiy Storchaka"
wrote: The name of complicated str methods is regular expressions. For doing these operations efficiently you need to convert arguments in special optimized form. This is what re.compile() does. If make a compilation on every invocation of a str method, this will add too large overhead and kill performance.
Even for simple string search a regular expression can be more efficient than a str method.
$ ./python -m timeit -s 'import re; p = re.compile("spam"); s = "spa"*100+"m"' -- 'p.search(s)' 500000 loops, best of 5: 680 nsec per loop
$ ./python -m timeit -s 's = "spa"*100+"m"' -- 's.find("spam")' 200000 loops, best of 5: 1.09 usec per loop
I ran Serhiy's tests (3.5.2) and got different results. # Setup: __builtins__.basestring = str #hack for re2 import in py3 import re, re2, regex n = 10000 s = "spa"*n+"m" p = re.compile("spam") pgex = regex.compile("spam") p2 = re2.compile("spam") # Tests: %timeit s.find("spam") %timeit p.search(s) %timeit pgex.search(s) %timeit p2.search(s) n = 100 350 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 554 ns ± 16.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 633 ns ± 8.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 1.62 µs ± 68.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) n = 1000 2.17 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 3.57 µs ± 27.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 3.46 µs ± 66.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 7.8 µs ± 72 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) n = 10000 17.3 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 33.5 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 31.7 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 67.5 µs ± 400 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Conclusions: - `.find` is fastest. On 3.6.1 (Windows), it's about the same speed as re: 638 ns vs 662 ns; 41.3 µs vs 43.8 µs. - re and regex have similar performance, probably due to a similar backend. - re2 is slowest. I suspect it's due to the wrapper. It may be copying the strings to a format suitable for the backend. P.S.: I also tested `"spam" in s`, which was linearly slower than `.find`. However, `in` is consistently faster than `.find` in my 3.6, so the discrepancy has likely been fixed. More curious is that, on `.find`, my MSVC-compiled 3.6.1 and 3.5.2 are twice as slow as my 3.5.2 for Ubuntu For Windows, but the re performance is similar. It's probably a compiler thing.