http.client Nagle/delayed-ack optimization
The http.client HTTPConnection._send_output method has an optimization for avoiding bad interactions between delayed-ack and the Nagle algorithm: http://hg.python.org/cpython/file/f32f67d26035/Lib/http/client.py#l884 Unfortunately this interacts rather poorly if the case where the message_body is a bytes instance and is rather large. If the message_body is bytes it is appended to the headers, which causes a copy of the data. When message_body is large this duplication of data can cause a significant spike in memory usage. (In my particular case I was uploading a 200MB file to 30 hosts at the same leading to memory spikes over 6GB. I've solved this by subclassing and removing the optimization, however I'd appreciate thoughts on how this could best be solved in the library itself. Options I have thought of are: 1: Have some size threshold on the copy. A little bit too much magic. Unclear what the size threshold should be. 2: Provide an explicit argument to turn the optimization on/off. This is ugly as it would need to be attached up the call chain to the request method. 3: Provide a property on the HTTPConnection object which enables the optimization or not. Optionally configured as part of __init__. 4: Add a class level attribute (similar to auto_open, default_port, etc) which controls the optimization. Be very interested to get some feedback so I can craft the appropriate patch. Thanks, Benno
On Sat, 15 Dec 2012 06:17:19 +1100
Ben Leslie
The http.client HTTPConnection._send_output method has an optimization for avoiding bad interactions between delayed-ack and the Nagle algorithm:
http://hg.python.org/cpython/file/f32f67d26035/Lib/http/client.py#l884
Unfortunately this interacts rather poorly if the case where the message_body is a bytes instance and is rather large.
If the message_body is bytes it is appended to the headers, which causes a copy of the data. When message_body is large this duplication of data can cause a significant spike in memory usage.
(In my particular case I was uploading a 200MB file to 30 hosts at the same leading to memory spikes over 6GB.
I've solved this by subclassing and removing the optimization, however I'd appreciate thoughts on how this could best be solved in the library itself.
Options I have thought of are:
1: Have some size threshold on the copy. A little bit too much magic. Unclear what the size threshold should be.
I think a hardcoded threshold is the right thing to do. It doesn't sound very useful to try doing a single send() call when you have a large chunk of data (say, more than 1 MB). Regards Antoine.
How serendipitous, I was just reporting a similar problem to Sony in one of their console sdks yesterday :) Indeed, the Nagle problem only shows up if you are sending more than one segments that are not full size. It will not occur in a sequence of full segments. Therefore, it is perfectly ok to send the headers + payload as a set of large chunks. The problem only occurs if sending two or more short segments. So, if sending even the short headers, followed by the large payload, there is no problem. The problem exists only if, in addition to the short headers, you are sending the short payload. In summary: If the payload is less than the MSS (consider this perhaps 2k) send it along with the headers. Otherwise, you can go ahead and send the headers, and thepayload (in large chunks if you want) without fear. See: http://en.wikipedia.org/wiki/Nagle%27s_algorithm and http://en.wikipedia.org/wiki/TCP_delayed_acknowledgment K
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Antoine Pitrou Sent: 14. desember 2012 19:27 To: python-dev@python.org Subject: Re: [Python-Dev] http.client Nagle/delayed-ack optimization
On Sat, 15 Dec 2012 06:17:19 +1100 Ben Leslie
wrote: The http.client HTTPConnection._send_output method has an optimization for avoiding bad interactions between delayed-ack and the Nagle algorithm:
http://hg.python.org/cpython/file/f32f67d26035/Lib/http/client.py#l884
Unfortunately this interacts rather poorly if the case where the message_body is a bytes instance and is rather large.
If the message_body is bytes it is appended to the headers, which causes a copy of the data. When message_body is large this duplication of data can cause a significant spike in memory usage.
(In my particular case I was uploading a 200MB file to 30 hosts at the same leading to memory spikes over 6GB.
I've solved this by subclassing and removing the optimization, however I'd appreciate thoughts on how this could best be solved in the library itself.
Options I have thought of are:
1: Have some size threshold on the copy. A little bit too much magic. Unclear what the size threshold should be.
I think a hardcoded threshold is the right thing to do. It doesn't sound very useful to try doing a single send() call when you have a large chunk of data (say, more than 1 MB).
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python- dev/kristjan%40ccpgames.com
participants (3)
-
Antoine Pitrou
-
Ben Leslie
-
Kristján Valur Jónsson