[pypy-issue] Issue #2091: non-blocking socket.send slow (gevent) (pypy/pypy)
issues-reply at bitbucket.org
Tue Jul 21 22:27:12 CEST 2015
New issue 2091: non-blocking socket.send slow (gevent)
gevent implements a blocking `socket.sendall` for non-blocking sockets with a simple loop over `socket.send`, catching EWOULDBLOCK as needed. (This isn't necessarily specific to gevent, of course.) In benchmarks, this is substantially slower under PyPy than it is under CPython, around 5 to 6 times slower.
Here's a small example that reproduces the problem; start the script once with an argument to be the server and put it in the background, then again to be the client. (This is a simplified, non-gevent version of [a benchmark Denis wrote](https://github.com/gevent/gevent/blob/master/greentest/bench_sendall.py); it's the only benchmark that PyPy is outperformed by CPython.)
#! /usr/bin/env python
from __future__ import print_function
server = socket.socket()
client, _ = server.accept()
def _sendall(conn, data):
data_memory = memoryview(data) # if memoryview is left out, CPython gets slow; makes no diff to PyPy
len_data_memory = len(data_memory)
data_sent = 0
while data_sent < len_data_memory:
data_sent += conn._sock.send(data_memory[data_sent:])
except socket.error as ex:
if ex.args == 35: # EWOULDBLOCK
length = 50 * 0x100000
data = b"x" * length
spent_total = 0
conn = socket.create_connection(("", 9999))
conn._sock.setblocking(0) # non-blocking is crucial
N = 20
for i in range(N):
start = time.time()
spent = time.time() - start
print("%.2f MB/s" % (length / spent / 0x100000))
spent_total += spent
print("~ %.2f MB/s" % (length * N / spent_total / 0x100000))
if __name__ == "__main__":
if len(sys.argv) > 1:
On one machine, CPython sends at ~ 1160MB/s, while PyPy 2.6/2.7 sends at ~150MB/s.
The _sendall function is a simplified version of what gevent actually uses to implement `socket.sendall`.
Interestingly, on CPython, if you take out the call to `memoryview` and instead pass the raw string argument to `socket.send`, it performs similarly to PyPy. This leads me to guess that it's something to do with pinning the buffer in memory repeatedly that's slowing PyPy down.
I've tried variations on how the data gets sliced to no avail. I have found that increasing the socket's SO_SNDBUF increases performance---using a very large buffer gets us about halfway to CPython performance.
Is there anything I can do as a maintainer of gevent to improve the performance of `socket.sendall`? I'm not against using PyPy internal functions, I just couldn't find any to use :) Or should I recommend that users set large write buffers on their sockets? Or is this a "bug" in PyPy that can be improved?
More information about the pypy-issue