[pypy-issue] Issue #2091: non-blocking socket.send slow (gevent) (pypy/pypy)

Jason Madden issues-reply at bitbucket.org
Tue Jul 21 22:27:12 CEST 2015

New issue 2091: non-blocking socket.send slow (gevent)

Jason Madden:

gevent implements a blocking `socket.sendall` for non-blocking sockets with a simple loop over `socket.send`, catching EWOULDBLOCK as needed. (This isn't necessarily specific to gevent, of course.) In benchmarks, this is substantially slower under PyPy than it is under CPython, around 5 to 6 times slower.

Here's a small example that reproduces the problem; start the script once with an argument to be the server and put it in the background, then again to be the client. (This is a simplified, non-gevent version of [a benchmark Denis wrote](https://github.com/gevent/gevent/blob/master/greentest/bench_sendall.py); it's the only benchmark that PyPy is outperformed by CPython.)

#! /usr/bin/env python
from __future__ import print_function
import sys
import time
import socket

def serve():
	server = socket.socket()
	server.bind(("", 9999))
	while True:
		client, _ = server.accept()

		while client.recv(4096):

def _sendall(conn, data):
	data_memory = memoryview(data) # if memoryview is left out, CPython gets slow; makes no diff to PyPy
	len_data_memory = len(data_memory)
	data_sent = 0
	while data_sent < len_data_memory:
			data_sent += conn._sock.send(data_memory[data_sent:])
		except socket.error as ex:
			if ex.args[0] == 35: # EWOULDBLOCK

def main():
	length = 50 * 0x100000
	data = b"x" * length
	spent_total = 0

	conn = socket.create_connection(("", 9999))
	conn._sock.setblocking(0) # non-blocking is crucial

	N = 20
	for i in range(N):
		start = time.time()
		_sendall(conn, data)
		spent = time.time() - start
		print("%.2f MB/s" % (length / spent / 0x100000))
		spent_total += spent

	print("~ %.2f MB/s" % (length * N / spent_total / 0x100000))

if __name__ == "__main__":
	if len(sys.argv) > 1:

On one machine, CPython sends at ~ 1160MB/s, while PyPy 2.6/2.7 sends at ~150MB/s. 

The _sendall function is a simplified version of what gevent actually uses to implement `socket.sendall`. 

Interestingly, on CPython, if you take out the call to `memoryview` and instead pass the raw string argument to `socket.send`, it performs similarly to PyPy. This leads me to guess that it's something to do with pinning the buffer in memory repeatedly that's slowing PyPy down. 

I've tried variations on how the data gets sliced to no avail. I have found that increasing the socket's SO_SNDBUF increases performance---using a very large buffer gets us about halfway to CPython performance.

Is there anything I can do as a maintainer of gevent to improve the performance of `socket.sendall`? I'm not against using PyPy internal functions, I just couldn't find any to use :) Or should I recommend that users set large write buffers on their sockets? Or is this a "bug" in PyPy that can be improved?

More information about the pypy-issue mailing list