[pypy-issue] Issue #2272: socket._fileobject.read horribly slow (pypy/pypy)

Antonio Cuni issues-reply at bitbucket.org
Wed Apr 13 13:42:22 EDT 2016

New issue 2272: socket._fileobject.read horribly slow

Antonio Cuni:

In theory, _fileobject is supposed to be a buffered layer on top of socket.recv/send.
However, socket.py implements it in a way which completely disable buffering for read(), and it also full of complicated half-working code for handling the buffering which does not happen.

After digging in CPython's history, we found that buffering has been enabled/disabled/re-enabled/re-disabled many times, each time because of a different issue; some relevant CPython's commits are (these are hg commit id): 8e062e572ea4, 54606ea9f4c7, 2729e977fdd9
Also, these issues:

Moreover, even if it were buffered (as it was before 2729e977fdd9), the performance would still be bad because the "fast path" copies the StringIO buffer again and again.

So, apparently _fileobject was supposed to be buffered, but then buffering was disabled at some point in 2008 (around release 2.5). Now it's possible/likely that there is some code in the wild which incorrectly *relies* on it to behave like it's unbuffered.

The conclusion is: _fileobject.read is horribly slow, but we risk to break some code by fixing it.
One possible thing to do is:
1. fix _fileobject.read
2. emit a warning if you call sock.recv or sock.makefile *after* you already called _fileobject.read (such code is likely to rely on the currently-unbuffered behaviou)
3. introduce a command-line flag to enable/disable this optimization

See also the relevant IRC discussion which started here:

Attached is a small benchmark which shows the problem (both on CPython and PyPy):

$ python try.py 
recv    (    4): 5000000 bytes, 0.83 seconds
read    (    4): 5000000 bytes, 2.70 seconds
buffered(    4): 5000000 bytes, 0.87 seconds
stringio(    4): 5000000 bytes, 0.45 seconds

$ pypy try.py
recv    (    4): 5000000 bytes, 0.44 seconds
read    (    4): 5000000 bytes, 0.44 seconds
buffered(    4): 5000000 bytes, 0.12 seconds
stringio(    4): 5000000 bytes, 0.11 seconds

More information about the pypy-issue mailing list