On 10/13/20 9:54 AM, Tim Peters wrote:
Due to the natures of the old and new algorithms, neither will be faster or slower in all cases, the newer one should never be dramatically slower than the old one, the new one will be spectacularly faster in some cases, and "in general" it seems impossible to guess in advance. It depends on the value the fancier preprocessing can extract from the pattern versus the higher preprocessing cost, and that in turn depends on details of exactly what the patterns and texts to be searched are.
My off-the-top-of-my-head reaction: I've always assumed that most string searching is tiny. Like, searching for a < 10 character string in a < 256 character string. That's what I'm always doing, at least. So while I'd obviously like to see us address the pathological cases cited--and thanks to Dennis Sweeney for the PRs!--I'd hope that it doesn't make these small searches any slower. Your email didn't give me a sense of how much additional preprocessing these new algorithms require; the fact that they're O(1) space suggests they aren't bad. But if you do lots and lots of dinky searches surely it would add up.
Looking at the PR, I see there's a special case for searching for a single character, and then cases for forward-search and reverse-search. I was wondering if I'd find a heuristic like "if the string you're searching in is < 2k, use the old search algorithm". Is that worth pursuing? It doesn't seem like the maintenance cost would be that high; I don't think anybody has touched that code in years. (No surprise, the Effbot did a good job.)
Python-Dev mailing list -- email@example.com
To unsubscribe send an email to firstname.lastname@example.org
Message archived at https://email@example.com/message/D5GYO2FVUW762RGZ5NMYKYM7PPWWRDIS/
Code of Conduct: http://python.org/psf/codeofconduct/