[pypy-issue] Issue #2624: Weird performance on pypy3 when reading from a text-mode file (pypy/pypy)

Nathaniel Smith issues-reply at bitbucket.org
Sun Aug 6 15:25:50 EDT 2017


New issue 2624: Weird performance on pypy3 when reading from a text-mode file
https://bitbucket.org/pypy/pypy/issues/2624/weird-performance-on-pypy3-when-reading

Nathaniel Smith:

This little benchmark tries to estimate the speed of `file.read` by comparing `seek(0); read()` versus `seek(0)`. If the file's opened in binary mode, then things makes sense and PyPy is fast – CPython does ~400 ns/(seek+read) and PyPy3 does ~70 ns/(seek+read). OTOH if the file's opened in text mode, then CPython does ~5000 ns/(seek+read) (which seems a bit silly but not implausible), and PyPy3 requires ~18,000 ns/(seek+read), which seems to suggest something has gone wrong.

Even weirder, I found that PyPy3's speed was stable for any individual file, but if I switched to a different file then sometimes the speed would change dramatically. Like `/etc/passwd` gives me ~18,000 ns/(seek+read), but `/etc/fstab` gives me ~6,700 ns/(seek+read), consistently. All the files I tried are plain ASCII, but maybe there's something weird about the pattern of newlines or something.

Possibly this is expected because Python 3's IO stack is just too complicated or something, but I found it surprising that such a small simple loop would be slower than CPython.

```python
import time

#COUNT = 1000000
#f = open("/etc/passwd", "rb")
COUNT = 100000
f = open("/etc/passwd", "rt")

while True:
    start = time.monotonic()
    for _ in range(COUNT):
        f.seek(0)
        f.read(10)
    between = time.monotonic()
    for _ in range(COUNT):
        f.seek(0)
    end = time.monotonic()

    both = (between - start) / COUNT * 1e6
    seek = (end - between) / COUNT * 1e6
    read = both - seek
    print("{:.2f} µs/(seek+read), {:.2f} µs/seek, estimate ~{:.2f} µs/read"
          .format(both, seek, read))
```




More information about the pypy-issue mailing list