[pypy-issue] Issue #2752: Incorrect results of intensive read() results passing to cpyext (pypy/pypy)

Michał Górny issues-reply at bitbucket.org
Mon Feb 12 15:06:16 EST 2018


New issue 2752: Incorrect results of intensive read() results passing to cpyext
https://bitbucket.org/pypy/pypy/issues/2752/incorrect-results-of-intensive-read

Michał Górny:

I've noticed that my program that hashes intensively using [pyblake2](https://github.com/dchest/pyblake2) extension starts giving wrong results at some point. I haven't been able to establish what is the exact cause but I've been able to create a [test case](https://github.com/mgorny/pypy-blake2-testcase) that easily reproduces the problem.

The exact code is:


```
#!python

import io
import pyblake2

bufsize = 4096
sum = 'ccefcd101b08863339602f7fdf2edd1d77ef05a970c36dbd7a560d33f957f81b15cfcac10114f8fca0d7c318b6aaa294220e3fcf4f88e6e3bd7840f121ff3b65'

def sub(i):
    cs = pyblake2.blake2b()
    with io.open('test.txt', 'rb') as f:
        for block in iter(lambda: f.read(bufsize), b''):
            cs.update(block)
    assert cs.hexdigest() == sum, i

for x in map(sub, range(10000)):
    pass
```

With this test case, PyPy reliably fails (generates incorrect checksum) at iteration 94.

Few observations based on testing:
1. The issue affects PyPy2 only. PyPy3 and CPython work fine.
2. I can reproduce a similar problem with pyblake2, pysha3 but not e.g. pycryptodome (which is also C extension) or builtin hash functions.
3. Some random changes to code (e.g. replacing io.open() with open()) cause the failing iteration no to change.
4. If instead of the loop, I do a single `f.read()`, I wasn't able to get it to fail (even with increased iteration count).
5. If I do `f.read()` without argument in a loop, it fails at iteration 852.
6. Changing `bufsize` and iteration count also changes the result, with no clear correlation. E.g. with bufsize of 506000 and 10000 iterations, it doesn't fail. With 100000 iterations, it fails at iteration 1488...

In other words, I really have no clue what might be happening here. I'm attaching the test script and file for completeness.




More information about the pypy-issue mailing list