On Thu, Oct 15, 2020 at 11:38 AM Tim Peters <tim.peters@gmail.com> wrote:
I think this is premature. There is almost never an optimization that's a pure win in all cases. For example, on some platforms `timsort` will never be as fast as the old samplesort in cases with a very large number of equal elements, and on all platforms `timsort` consumes more memory. And it's a wash on "random" data on most platforms. Nevertheless, it was a significant speed win for many - possibly even most - real-life cases.
And timsort is one of my go-tos for teaching the concept of hybrid sorting algorithms, because, at its heart, it's simple enough to explain, and it manages to be so incredibly valuable in real-world code. :)
But it would leave open a different idea for an "opt-in" facility: one that allowed to give a needle to a function and get back an object capturing the results of preprocessing. Then a different wrapper around the search code that accepted the already-pre-processed info. There are, e.g., certainly cases where repeated searches for a given 4-character needle could be sped significantly by exploiting the new code, but where the overhead of preprocessing is too much to bear in "random" performance testing. It would also open the door to more aggressive (expensive) kinds of preprocessing. That's where users may be able to make useful, informed judgments.
Kinda like the way a compiled regex is used? I like this idea. So it'd be heuristics in the core language that choose a good default for most situations, and then a str method that returns a preprocessed needle. In a Python implementation that doesn't want to use two different algorithms, that preprocessor could return the string unchanged, but otherwise it's an opaque object usable only in string searches. +1, if my interpretation of your description is correct. ChrisA