[pypy-issue] Issue #2272: socket._fileobject.read horribly slow (pypy/pypy)

Antonio Cuni issues-reply at bitbucket.org
Wed Apr 13 13:42:22 EDT 2016


New issue 2272: socket._fileobject.read horribly slow
https://bitbucket.org/pypy/pypy/issues/2272/socket_fileobjectread-horribly-slow

Antonio Cuni:

In theory, _fileobject is supposed to be a buffered layer on top of socket.recv/send.
However, socket.py implements it in a way which completely disable buffering for read(), and it also full of complicated half-working code for handling the buffering which does not happen.

After digging in CPython's history, we found that buffering has been enabled/disabled/re-enabled/re-disabled many times, each time because of a different issue; some relevant CPython's commits are (these are hg commit id): 8e062e572ea4, 54606ea9f4c7, 2729e977fdd9
Also, these issues:
https://mail.python.org/pipermail/python-dev/2008-April/078613.html
https://bugs.python.org/issue2632
https://bugs.python.org/issue2760

Moreover, even if it were buffered (as it was before 2729e977fdd9), the performance would still be bad because the "fast path" copies the StringIO buffer again and again.

So, apparently _fileobject was supposed to be buffered, but then buffering was disabled at some point in 2008 (around release 2.5). Now it's possible/likely that there is some code in the wild which incorrectly *relies* on it to behave like it's unbuffered.

The conclusion is: _fileobject.read is horribly slow, but we risk to break some code by fixing it.
One possible thing to do is:
1. fix _fileobject.read
2. emit a warning if you call sock.recv or sock.makefile *after* you already called _fileobject.read (such code is likely to rely on the currently-unbuffered behaviou)
3. introduce a command-line flag to enable/disable this optimization

See also the relevant IRC discussion which started here:
https://botbot.me/freenode/pypy/2016-04-13/?msg=64058470&page=3

Attached is a small benchmark which shows the problem (both on CPython and PyPy):

```
$ python try.py 
recv    (    4): 5000000 bytes, 0.83 seconds
read    (    4): 5000000 bytes, 2.70 seconds
buffered(    4): 5000000 bytes, 0.87 seconds
stringio(    4): 5000000 bytes, 0.45 seconds

$ pypy try.py
recv    (    4): 5000000 bytes, 0.44 seconds
read    (    4): 5000000 bytes, 0.44 seconds
buffered(    4): 5000000 bytes, 0.12 seconds
stringio(    4): 5000000 bytes, 0.11 seconds
```




More information about the pypy-issue mailing list