[Python-Dev] httplib and bad response chunking

Greg Ward gward-1337f07a94b43060ff5c1ea922ed93d6 at python.net
Wed Jul 26 04:32:13 CEST 2006


So I accidentally discovered the other day that httplib does not handle
a particular type of mangled HTTP response very well.  In particular, it
tends to blow up with an undocumented ValueError when the server screws
up "chunked" encoding.  I'm not the first to discover this, either: see
http://www.python.org/sf/1486335 .

<digression>
HTTP 1.1 response chunking allows clients to know how many bytes of
response to expect for dynamic content, i.e. when it's not possible to
include a "Content-length" header.  A chunked response might look like
this:

  0005\r\nabcd\n\r\n0004\r\nabc\n\r\n0\r\n\r\n

which means:
  0x0005 bytes in first chunk, which is "abcd\n"
  0x0004 bytes in second chunk, which is "abc\n"

Each chunk size is terminated with "\r\n"; each chunk is terminated with
"\r\n"; end of response is indicated by a chunk of 0 bytes, hence the
"\r\n\r\n" at the end.

Details in RFC 2616:
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
</digression>

Anyways, what I discovered in the wild the other day was a response like
this:

  0005\r\nabcd\n\r\n0004\r\nabc\n\r\n\r\n

i.e. the chunk-size for the terminating empty chunk was missing.
This cause httplib.py to blow up with ValueError because it tried to
call

  int(line, 16)

assuming that 'line' contained a hex number, when in fact it was the
empty string.  Oops.

IMHO the minimal fix is to turn ValueError into HTTPException (or a
subclass thereof); httplib should not raise ValueError just because some
server sends a bad response.  (The server in question was Apache 2.0.52
running PHP 4.3.9 sending a big hairy error page because the database
was down.)

Where I'm getting hung up is how far to test this stuff.  I have
discovered other hypothetical cases of bad chunking that cause httplib
to go into an infinite loop or block forever on socket.readline().
Should we worry about those cases as well, despite not having seen them
happen in the wild?  More annoying, I can reproduce the "block forever"
case using a real socket, but not using the StringIO-based FakeSocket
class in test_httplib.

Anyways, I've cobbled together a crude hack to test_httplib.py that
exposes the problem:

  http://sourceforge.net/tracker/download.php?group_id=5470&atid=105470&file_id=186245&aid=1486335

Feedback welcome.  (Fixing the inadvertent ValueError is trivial, so I'm
concentrating on getting the tests right first.)

Oh yeah, my patch is relative to the 2.4 branch.

        Greg
-- 
Greg Ward <gward at python.net>                         http://www.gerg.ca/
I don't believe there really IS a GAS SHORTAGE.. I think it's all just
a BIG HOAX on the part of the plastic sign salesmen -- to sell more numbers!!


More information about the Python-Dev mailing list