On Feb 8, 2018 13:06, "Serhiy Storchaka" email@example.com wrote:
08.02.18 12:45, Franklin? Lee пише:
Could it be that re uses an optimization that can also be used in str? CPython uses a modified Boyer-Moore for str.find: https://github.com/python/cpython/blob/master/Objects/string lib/fastsearch.h http://effbot.org/zone/stringlib.htm Maybe there's a minimum length after which it's better to precompute a table.
Yes, there is a special optimization in re here. It isn't free, you need to spend some time for preparing it. You need a special object that keeps an optimized representation for faster search. This makes it very unlikely be used in str, because you need either spend the time for compilation on every search, or use some kind of caching, which is not free too, adds complexity and increases memory consumption. Note also in case of re the compiler is implemented in Python. This reduces the complexity.
The performance of the one-needle case isn't really relevant, though, is it? This idea is for the multi-needle case, and my tests showed that re performs even worse than a loop of `.find`s. How do re and .find scale with both number and lengths of needles on your machine?