Simon Cross wrote:
Well, since the source for _read_chunked includes the comment
# XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big.
you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end.
True, I'll be trying that and reporting back, but, more interestingly, I did some analysis with wireshark (only 200MB-odd of .pcap logs that was fun ;-) to see the differences in the http conversation and noticed more interestingness... So, httplib does this: GET /<blah> HTTP/1.1 Host: <blah> Accept-Encoding: identity Authorization: Basic <blah> HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 11:44:22 GMT Server: Apache-Coyote/1.1 ContentLength: 116245504 Content-Type: application/vnd.excel Transfer-Encoding: chunked While wget does this: <snip 401 conversation> GET /<blah> HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: <blah> Connection: Keep-Alive Authorization: Basic <blah> HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 11:35:19 GMT Server: Apache-Coyote/1.1 ContentLength: 116245504 Content-Type: application/vnd.excel Connection: close Interesting points: - Apache in this instance responds with HTTP 1.1, even though the wget request was 1.0, is that allowed? - Apache responds with a chunked response only to httplib. Why is that? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk