[Python-Dev] Re: Changing Python's string search algorithms

15 Oct 2020


      On Thu, Oct 15, 2020 at 11:38 AM Tim Peters  wrote:
...
I think this is premature. There is almost never an optimization
that's a pure win in all cases. For example, on some platforms
`timsort` will never be as fast as the old samplesort in cases with a
very large number of equal elements, and on all platforms `timsort`
consumes more memory.  And it's a wash on "random" data on most
platforms. Nevertheless, it was a significant speed win for many -
possibly even most - real-life cases.
And timsort is one of my go-tos for teaching the concept of hybrid
sorting algorithms, because, at its heart, it's simple enough to
explain, and it manages to be so incredibly valuable in real-world
code. :)
...
But it would leave open a different idea for an "opt-in" facility: one
that allowed to give a needle to a function and get back an object
capturing the results of preprocessing.  Then a different wrapper
around the search code that accepted the already-pre-processed info.
There are, e.g., certainly cases where repeated searches for a given
4-character needle could be sped significantly by exploiting the new
code, but where the overhead of preprocessing is too much to bear in
"random" performance testing. It would also open the door to more
aggressive (expensive) kinds of preprocessing.  That's where users may
be able to make useful, informed judgments.
Kinda like the way a compiled regex is used? I like this idea. So it'd
be heuristics in the core language that choose a good default for most
situations, and then a str method that returns a preprocessed needle.
In a Python implementation that doesn't want to use two different
algorithms, that preprocessor could return the string unchanged, but
otherwise it's an opaque object usable only in string searches.

+1, if my interpretation of your description is correct.

ChrisA

[Python-Dev] Re: Changing Python's string search algorithms

Chris Angelico