[Python-ideas] Complicate str methods

Serhiy Storchaka storchaka at gmail.com
Thu Feb 8 13:04:48 EST 2018


08.02.18 12:45, Franklin? Lee пише:
> On Feb 7, 2018 17:28, "Serhiy Storchaka" 
> <storchaka at gmail.com 
> <mailto:storchaka at gmail.com>> wrote:
>     Even for simple string search a regular expression can be more
>     efficient than a str method.
> 
>     $ ./python -m timeit -s 'import re; p = re.compile("spam"); s =
>     "spa"*100+"m"' -- 'p.search(s)'
>     500000 loops, best of 5: 680 nsec per loop
> 
>     $ ./python -m timeit -s 's = "spa"*100+"m"' -- 's.find("spam")'
>     200000 loops, best of 5: 1.09 usec per loop
> 
> 
> That's an odd result. Python regexes use backtracking, not a DFA. I gave 
> a timing test earlier in the thread:
> https://mail.python.org/pipermail/python-ideas/2018-February/048879.html
> I compared using repeated .find()s against a precompiled regex, then 
> against a pure Python and unoptimized tree-based algorithm.
> 
> Could it be that re uses an optimization that can also be used in str? 
> CPython uses a modified Boyer-Moore for str.find:
> https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h
> http://effbot.org/zone/stringlib.htm
> Maybe there's a minimum length after which it's better to precompute a 
> table.

Yes, there is a special optimization in re here. It isn't free, you need 
to spend some time for preparing it. You need a special object that 
keeps an optimized representation for faster search. This makes it very 
unlikely be used in str, because you need either spend the time for 
compilation on every search, or use some kind of caching, which is not 
free too, adds complexity and increases memory consumption. Note also in 
case of re the compiler is implemented in Python. This reduces the 
complexity.

Patches that add optimization for other common cases are welcomed.



More information about the Python-ideas mailing list