[Tim]
... Alas, the higher preprocessing costs leave the current PR slower in "too many" cases too, especially when the needle is short and found early in the haystack. Then any preprocessing cost approaches a pure waste of time.
But that was this morning. Since then, Dennis changed the PR to back off to the current code when the needle is "too small". There are very few cases we know of now where the PR code is slower at all than the current code, none where it's dramatically slower, many where it's significantly faster, and some non-contrived cases where it's dramatically faster (for example, over a factor of 10 in stringbench.py's "late match, 100 characters" forward-search tests, and especially beneficial for Unicode (as opposed to bytes)). Then there are the pathological cases like in the original issue report, where it's multiple orders of magnitude faster (about 3 1/2 hours to under a tenth of a second in the issue report's case). Still waiting for someone who thinks string search speed is critical in their real app to give it a try. In the absence of that, I endorse merging this.