[issue41972] bytes.find consistently hangs in a particular scenario
Tim Peters
report at bugs.python.org
Sat Oct 17 21:04:29 EDT 2020
Tim Peters <tim at python.org> added the comment:
I don't think we have any idea how the OP stumbled into this. Looks like it "just happened".
The case you construted is quadratic-time, but not quite as bad:
BBBBBaaaaaBBBBB
BBBBBBBBBB
Fails at once, because 'a' doesn't match the trailing needle 'B'. The Bloom filter is useless because the next haystack 'B' _is_ in the needle. So shift right 1:
BBBBBaaaaaBBBBB
xBBBBBBBBBB
Last character matches, so goes on to compare 4 Bs and an aB mismatch. 6 in all. Bloom filter useless again, and another shift by 1:
BBBBBaaaaaBBBBB
xxBBBBBBBBBB
Now there's 1 less compare, because only 3 Bs match.
And so on. After the next shift, only 2 Bs match, and after the shift following that, only 1 B. Getting to:
BBBBBaaaaaBBBBB
xxxxxBBBBBBBBBB
Now after matching the trailing Bs, aB mismatches at once. And we're done, because another shift by 1 moves the end of the needle beyond the end of the haystack (although I bet the Bloom filter reads up the trailing NUL byte from the haystack and actually makes a "giant" shift).
You're right that two-way yawns at this ;-) 'B' * (K*2) is split into
u = "" # empty string!
v = 'B' * (K*2)
so only the "match the right half" part of the searching algorithm comes into play.
BBBBBaaaaaBBBBB
BBBBBBBBBB
first mismatches at the 6th character (1-based counting - at index 5), so it shifts right by 6:
BBBBBaaaaaBBBBB
xxxxxxBBBBBBBBBB
And it's done, because the needle has moved beyond the end of the haystack.
The brainpower that went into making this "look trivial" is quite impressive :-)
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
More information about the Python-bugs-list
mailing list