how to debug httplib slowness
Hi All, I'd like to work on this issue: http://bugs.python.org/issue2576 Specifically, in my case, while IE can download a 150Mb file from a local server in about 3 seconds, httplib takes over 20 minutes! However, I'm kinda stumped on where to start with debugging the difference. I've tried upping the buffer size as suggested in the issue, but it's had no effect... Any ideas? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
Chris Withers <chris <at> simplistix.co.uk> writes:
However, I'm kinda stumped on where to start with debugging the difference. I've tried upping the buffer size as suggested in the issue, but it's had no effect...
Then perhaps it's not the same bug. Please take a look at CPU utilization during the download. If Python takes close to 100% CPU, it might be due to the lack of buffering or any other suboptimal situation in the implementation. If Python takes close to 0%, then it's just waiting on data to arrive from the network...
Antoine Pitrou wrote:
Chris Withers <chris <at> simplistix.co.uk> writes:
However, I'm kinda stumped on where to start with debugging the difference. I've tried upping the buffer size as suggested in the issue, but it's had no effect...
Then perhaps it's not the same bug. Please take a look at CPU utilization during the download. If Python takes close to 100% CPU, it might be due to the lack of buffering or any other suboptimal situation in the implementation.
Well, it's locked at 25% on a quad core box, so yeah, I'd say something is wrong ;-) I guess I could try profile it and finding out where most of the time is being spent? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
Try instrumenting the actual calls to the lowest-level socket methods (recv() and send()) and log for each one the arguments, return time, and how long it took. You might see a pattern. Is this on Windows? It's embarrassing, we've had problems with socket speed on Windows since 1999 and they're still not gone... :-( On Wed, Aug 12, 2009 at 4:05 AM, Chris Withers<chris@simplistix.co.uk> wrote:
Hi All,
I'd like to work on this issue:
http://bugs.python.org/issue2576
Specifically, in my case, while IE can download a 150Mb file from a local server in about 3 seconds, httplib takes over 20 minutes!
However, I'm kinda stumped on where to start with debugging the difference. I've tried upping the buffer size as suggested in the issue, but it's had no effect...
Any ideas?
Chris
-- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
s/return time/return size/ On Wed, Aug 12, 2009 at 8:07 AM, Guido van Rossum<guido@python.org> wrote:
Try instrumenting the actual calls to the lowest-level socket methods (recv() and send()) and log for each one the arguments, return time, and how long it took. You might see a pattern. Is this on Windows? It's embarrassing, we've had problems with socket speed on Windows since 1999 and they're still not gone... :-(
On Wed, Aug 12, 2009 at 4:05 AM, Chris Withers<chris@simplistix.co.uk> wrote:
Hi All,
I'd like to work on this issue:
http://bugs.python.org/issue2576
Specifically, in my case, while IE can download a 150Mb file from a local server in about 3 seconds, httplib takes over 20 minutes!
However, I'm kinda stumped on where to start with debugging the difference. I've tried upping the buffer size as suggested in the issue, but it's had no effect...
Any ideas?
Chris
-- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Try instrumenting the actual calls to the lowest-level socket methods (recv() and send()) and log for each one the arguments, return time, and how long it took.
Can I do that in python code?
You might see a pattern. Is this on Windows?
Well, yes, but I'm not 100%. The problematic machine is a Windows box, but there are no non-windows boxes on that network and vpn'ing from one of my non-windows boxes slows things down enough that I'm not confident what I'd be seeing was indicative of the same problem...
It's embarrassing, we've had problems with socket speed on Windows since 1999 and they're still not gone... :-(
Oh dear :-( Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
On Wed, Aug 12, 2009 at 8:34 AM, Chris Withers<chris@simplistix.co.uk> wrote:
Guido van Rossum wrote:
Try instrumenting the actual calls to the lowest-level socket methods (recv() and send()) and log for each one the arguments, return time, and how long it took.
Can I do that in python code?
Probably if you hack on the socket.py file long enough.
You might see a pattern. Is this on Windows?
Well, yes, but I'm not 100%. The problematic machine is a Windows box, but there are no non-windows boxes on that network and vpn'ing from one of my non-windows boxes slows things down enough that I'm not confident what I'd be seeing was indicative of the same problem...
Time to set up a more conclusive test. Do you have something like curl or wget available on the same box?
It's embarrassing, we've had problems with socket speed on Windows since 1999 and they're still not gone... :-(
Oh dear :-(
Well it may be that it's really just your box. Or proxy settings. Look into proxy settings. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
You might see a pattern. Is this on Windows? Well, yes, but I'm not 100%. The problematic machine is a Windows box, but there are no non-windows boxes on that network and vpn'ing from one of my non-windows boxes slows things down enough that I'm not confident what I'd be seeing was indicative of the same problem...
Time to set up a more conclusive test. Do you have something like curl or wget available on the same box?
Time taken with IE: ~2 seconds Time taken with wget: 2.2 seconds Time taken with Python [1]: 20-30 minutes I did a run of the script through cProfile and got the following: pstats.Stats('download.profile').strip_dirs().sort_stats('time').print_stats(10) 1604545 function calls in 1956.057 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 1 1950.767 1950.767 1955.952 1955.952 httplib.py:544(_read_chunked) 85125 1.235 0.000 1.235 0.000 {method 'recv' of '_socket.socket' objects} 85838 1.031 0.000 2.246 0.000 socket.py:313(read) 85838 0.787 0.000 3.386 0.000 httplib.py:601(_safe_read) 42928 0.614 0.000 1.779 0.000 socket.py:373(readline) 128775 0.344 0.000 0.344 0.000 {method 'write' of 'cStringIO.StringO' objects} 200796 0.206 0.000 0.206 0.000 {method 'seek' of 'cStringIO.StringO' objects} 85838 0.179 0.000 0.179 0.000 {min} 128767 0.135 0.000 0.135 0.000 {cStringIO.StringIO} 72735 0.116 0.000 0.116 0.000 {method 'read' of 'cStringIO.StringO' objects} ...which isn't what I was expecting! Am I right in reading this as most of the time is being spent in httplib's HTTPResponse._read_chunked and none of the methods it calls? If so, is there a better way that a bunch of print statements to find where in that method the time is being spent? cheers, Chris [1] Python 2.6.2 on Windows Server 2003 R2 running this script: from base64 import encodestring from httplib import HTTPConnection from datetime import datetime conn = HTTPConnection('servername') headers = {} a = 'Basic '+encodestring('username:password').strip() headers['Authorization']=a t = datetime.now() print t conn.request('GET','/some/big/file',None,headers) print 'request:',datetime.now()-t response = conn.getresponse() print 'response:',datetime.now()-t data = response.read() if len(data)<2000: print data print 'read:',datetime.now()-t -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
On Fri, Sep 4, 2009 at 1:11 PM, Chris Withers<chris@simplistix.co.uk> wrote:
Am I right in reading this as most of the time is being spent in httplib's HTTPResponse._read_chunked and none of the methods it calls?
If so, is there a better way that a bunch of print statements to find where in that method the time is being spent?
Well, since the source for _read_chunked includes the comment # XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big. you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end. Schiavo Simon
Simon Cross <hodgestar+pythondev <at> gmail.com> writes:
Well, since the source for _read_chunked includes the comment
# XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big.
you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end.
+1 for trying this. Given differences between platforms in realloc() performance, it might be the reason why it goes unnoticed under Linux but degenerates under Windows. As a sidenote, it is interesting that even an stdlib module makes this mistake and acknowledges it without trying to fix it.
Antoine Pitrou wrote:
Simon Cross <hodgestar+pythondev <at> gmail.com> writes:
Well, since the source for _read_chunked includes the comment
# XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big.
you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end.
+1 for trying this. Given differences between platforms in realloc() performance, it might be the reason why it goes unnoticed under Linux but degenerates under Windows.
And how! The following change dropped the download time using httplib to 2.3 seconds: http://svn.python.org/view/python/trunk/Lib/httplib.py?r1=74523&r2=74655
As a sidenote, it is interesting that even an stdlib module makes this mistake and acknowledges it without trying to fix it.
No longer in this case ;-) The fix is applied on the trunk, but the problem still exists on the 2.6 branch, 3.1 branch and 3.2 branch. Which of these should I merge to? I assume all of them? Do I need to update any changelog files or similar to indicate this bug has been fixed? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
Chris Withers <chris <at> simplistix.co.uk> writes:
The fix is applied on the trunk, but the problem still exists on the 2.6 branch, 3.1 branch and 3.2 branch.
Which of these should I merge to? I assume all of them?
The performance problem is sufficiently serious that it should be considered a bug so, yes, you should merge to all remaining branches (3.2, 2.6 and 3.1).
Do I need to update any changelog files or similar to indicate this bug has been fixed?
Yes, add an entry to Misc/NEWS under the "Library" section using the appropriate conventions. Thanks Antoine.
Simon Cross wrote:
Well, since the source for _read_chunked includes the comment
# XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big.
you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end.
True, I'll be trying that and reporting back, but, more interestingly, I did some analysis with wireshark (only 200MB-odd of .pcap logs that was fun ;-) to see the differences in the http conversation and noticed more interestingness... So, httplib does this: GET /<blah> HTTP/1.1 Host: <blah> Accept-Encoding: identity Authorization: Basic <blah> HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 11:44:22 GMT Server: Apache-Coyote/1.1 ContentLength: 116245504 Content-Type: application/vnd.excel Transfer-Encoding: chunked While wget does this: <snip 401 conversation> GET /<blah> HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: <blah> Connection: Keep-Alive Authorization: Basic <blah> HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 11:35:19 GMT Server: Apache-Coyote/1.1 ContentLength: 116245504 Content-Type: application/vnd.excel Connection: close Interesting points: - Apache in this instance responds with HTTP 1.1, even though the wget request was 1.0, is that allowed? - Apache responds with a chunked response only to httplib. Why is that? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
On Fri, Sep 04, 2009 at 04:02:39PM +0100, Chris Withers wrote:
So, httplib does this:
GET /<blah> HTTP/1.1 [skip] While wget does this:
GET /<blah> HTTP/1.0 [skip] - Apache responds with a chunked response only to httplib. Why is that?
Probably because wget uses HTTP/1.0? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Oleg Broytmann <phd@phd.pp.ru> wrote:
On Fri, Sep 04, 2009 at 04:02:39PM +0100, Chris Withers wrote:
So, httplib does this:
GET /<blah> HTTP/1.1 [skip] While wget does this:
GET /<blah> HTTP/1.0 [skip] - Apache responds with a chunked response only to httplib. Why is that?
Probably because wget uses HTTP/1.0?
Yep. Not allowed to respond chunked to 1.0 requests. Bill
Chris Withers wrote:
- Apache in this instance responds with HTTP 1.1, even though the wget request was 1.0, is that allowed?
- Apache responds with a chunked response only to httplib. Why is that?
I find it very confusing that you say "Apache" since your really want to say "Coyote" which is to say "Tomcat". Http11Processor.java@1547: if (entityBody && http11 && keepAlive) { outputBuffer.addActiveFilter (outputFilters[Constants.CHUNKED_FILTER]); contentDelimitation = true; headers.addValue( Constants.TRANSFERENCODING).setString(Constants.CHUNKED); } else { outputBuffer.addActiveFilter (outputFilters[Constants.IDENTITY_FILTER]); } So, as Oleg said, it's because httplib talks HTTP/1.1 and wget talks HTTP/1.0. All HTTP/1.1 client must support chunked transfer-encoding, and apparently Tomcat/Coyote defaults to that unless it is either an empty message, not a HTTP/1.1 client, or the request is not to be kept alive ("Connection: close" or no more keep-alive slots on the server). As Simon said, changing this to do ''.join(chunks) is really the best first step to take. -Scott -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On Fri, Sep 4, 2009 at 4:28 AM, Simon Cross<hodgestar+pythondev@gmail.com> wrote:
On Fri, Sep 4, 2009 at 1:11 PM, Chris Withers<chris@simplistix.co.uk> wrote:
Am I right in reading this as most of the time is being spent in httplib's HTTPResponse._read_chunked and none of the methods it calls?
If so, is there a better way that a bunch of print statements to find where in that method the time is being spent?
Well, since the source for _read_chunked includes the comment
# XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big.
you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end.
+1 on trying this. Constructing a 116MB string by concatenating 1KB buffers surely must take forever. (116MB divided by 85125 recv() calls give 1365 byte per chunk, which is awful.) The HTTP/1.0 business looks like a red herring. Also agreed that this is an embarrassment. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
+1 on trying this. Constructing a 116MB string by concatenating 1KB buffers surely must take forever. (116MB divided by 85125 recv() calls give 1365 byte per chunk, which is awful.) The HTTP/1.0 business looks like a red herring.
Also agreed that this is an embarrassment.
Embarrassment eradicated ;-) http://bugs.python.org/issue6838 Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
Guido van Rossum <guido <at> python.org> writes:
+1 on trying this. Constructing a 116MB string by concatenating 1KB buffers surely must take forever. (116MB divided by 85125 recv() calls give 1365 byte per chunk, which is awful.)
It certainly is quite small but perhaps the server tries to stay below the detected MTU? (not that there is necessarily any point in doing so for most HTTP content, IMO... except perhaps when the client does some progressive decoding)
participants (7)
-
Antoine Pitrou
-
Bill Janssen
-
Chris Withers
-
Guido van Rossum
-
Oleg Broytmann
-
Scott Dial
-
Simon Cross