[New-bugs-announce] [issue41972] bytes.find consistently hangs in a particular scenario

Kevin Mills report at bugs.python.org
Wed Oct 7 19:31:59 EDT 2020


New submission from Kevin Mills <kevin.mills226+bugs.python at gmail.com>:

Sorry for the vague title. I'm not sure how to succinctly describe this issue.

The following code:

```
with open("data.bin", "rb") as f:
    data = f.read()

base = 15403807 * b'\xff'
longer = base + b'\xff'

print(data.find(base))
print(data.find(longer))
```

Always hangs on the second call to find.

It might complete eventually, but I've left it running and never seen it do so. Because of the structure of data.bin, it should find the same position as the first call to find.

The first call to find completes and prints near instantly, which makes the pathological performance of the second (which is only searching for one b"\xff" more than the first) even more mystifying.

I attempted to upload the data.bin file I was working with as an attachment here, but it failed multiple times. I assume it's too large for an attachment; it's a 32MiB file consisting only of 00 bytes and FF bytes.

Since I couldn't attach it, I uploaded it to a gist. I hope that's okay.

https://gist.github.com/Zeturic/7d0480a94352968c1fe92aa62e8adeaf

I wasn't able to trigger the pathological runtime behavior with other sequences of bytes, which is why I uploaded it in the first place. For example, if it is randomly generated, it doesn't trigger it.

I've verified that this happens on multiple versions of CPython (as well as PyPy) and on multiple computers / operating systems.

----------
messages: 378197
nosy: Zeturic
priority: normal
severity: normal
status: open
title: bytes.find consistently hangs in a particular scenario
type: performance
versions: Python 3.8, Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________


More information about the New-bugs-announce mailing list