[Python-ideas] Complicate str methods
Serhiy Storchaka
storchaka at gmail.com
Thu Feb 8 13:04:48 EST 2018
08.02.18 12:45, Franklin? Lee пише:
> On Feb 7, 2018 17:28, "Serhiy Storchaka"
> <storchaka at gmail.com
> <mailto:storchaka at gmail.com>> wrote:
> Even for simple string search a regular expression can be more
> efficient than a str method.
>
> $ ./python -m timeit -s 'import re; p = re.compile("spam"); s =
> "spa"*100+"m"' -- 'p.search(s)'
> 500000 loops, best of 5: 680 nsec per loop
>
> $ ./python -m timeit -s 's = "spa"*100+"m"' -- 's.find("spam")'
> 200000 loops, best of 5: 1.09 usec per loop
>
>
> That's an odd result. Python regexes use backtracking, not a DFA. I gave
> a timing test earlier in the thread:
> https://mail.python.org/pipermail/python-ideas/2018-February/048879.html
> I compared using repeated .find()s against a precompiled regex, then
> against a pure Python and unoptimized tree-based algorithm.
>
> Could it be that re uses an optimization that can also be used in str?
> CPython uses a modified Boyer-Moore for str.find:
> https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h
> http://effbot.org/zone/stringlib.htm
> Maybe there's a minimum length after which it's better to precompute a
> table.
Yes, there is a special optimization in re here. It isn't free, you need
to spend some time for preparing it. You need a special object that
keeps an optimized representation for faster search. This makes it very
unlikely be used in str, because you need either spend the time for
compilation on every search, or use some kind of caching, which is not
free too, adds complexity and increases memory consumption. Note also in
case of re the compiler is implemented in Python. This reduces the
complexity.
Patches that add optimization for other common cases are welcomed.
More information about the Python-ideas
mailing list