[Tim]

Note that no "extra" storage is needed to exploit this. No character lookups, no extra expenses in time or space of any kind. Just "if we mismatch on the k'th try, we can jump ahead k positions".

[Antoine Pitrou <solipsis@pitrou.net>]

Ok, so that means that on a N-character haystack, it'll always do at least N character comparisons?

IIRC, the current algorithm is able to do much less than that in favorable cases. If the needle is k characters long and they are all distinct, it's able to do N/k comparisons.

It's an excellent question, and from what I said it's a correct inference. But I only described how the algorithm matches the "v" part of the "needle = u + v" splitting. When matching the "u" part, skips materially exceeding the count of comparisons made are possible. The paper claims the minimal number of comparisons needed (ignoring preprocessing) is 2N/k, so same order as the current algorithm. But, for the constant factor, Dennis's code achieves N/k, because he augmented the Crochemre-Perrin algorithm with a variant of Sunday's algorithm (a simplification, but not as extreme as the current code's simplification). Note that best case near N/k is a known lower bound for best cases - impossible to do materially better. In the 'x' * 100 + 'y' example I ended my msg with, recall that v wa just "y", so the part I described was of minimum value - if the last haystack character in the current search window isn't "y", that the mismatch occurred on the first try leads to a shift of just 1. But if the character one beyond the end of the current search window isn't "x" or "y" then, Sunday's algorithm allows to skip 102 positions (len(needle) + 1). His algorithm requires precomputing a skip-count table of O(len(sigma)) space, where sigma is the universe of possible characters ("the alphabet"). That's "a whole lot" for Unicode. Fredrik and Dennis simplified Sunday's algorithm to weaker versions that require relatively small constant space instead, independent of needle, pattern, and alphabet size. Despite the current code's simplification of that nearly to the point of non-existence (it reduces the space needed to 32 bits, and with only one possible skip count), it's nevertheless so effective that it beat Dennis's initially pure Crochemre-Perrin code by dramatic margins in a significant number of exceptionally lucky (for the current code) cases. After adding a variation of Sunday (the current PR uses 256 bytes, and has 64K possible skip counts - but perhaps other tradeoffs would work better), we haven't yet seen a case (contrived or natural) where the current code is dramatically faster. But we have seen non-contrived cases where the current PR is dramatically faster, including in the benchmark code Fredrik gifted us with (Tools/stringbench/stringbench.py in your Python tree). Alas, the higher preprocessing costs leave the current PR slower in "too many" cases too, especially when the needle is short and found early in the haystack. Then any preprocessing cost approaches a pure waste of time.