httplib and bad response chunking

So I accidentally discovered the other day that httplib does not handle a particular type of mangled HTTP response very well. In particular, it tends to blow up with an undocumented ValueError when the server screws up "chunked" encoding. I'm not the first to discover this, either: see http://www.python.org/sf/1486335 . <digression> HTTP 1.1 response chunking allows clients to know how many bytes of response to expect for dynamic content, i.e. when it's not possible to include a "Content-length" header. A chunked response might look like this: 0005\r\nabcd\n\r\n0004\r\nabc\n\r\n0\r\n\r\n which means: 0x0005 bytes in first chunk, which is "abcd\n" 0x0004 bytes in second chunk, which is "abc\n" Each chunk size is terminated with "\r\n"; each chunk is terminated with "\r\n"; end of response is indicated by a chunk of 0 bytes, hence the "\r\n\r\n" at the end. Details in RFC 2616: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1 </digression> Anyways, what I discovered in the wild the other day was a response like this: 0005\r\nabcd\n\r\n0004\r\nabc\n\r\n\r\n i.e. the chunk-size for the terminating empty chunk was missing. This cause httplib.py to blow up with ValueError because it tried to call int(line, 16) assuming that 'line' contained a hex number, when in fact it was the empty string. Oops. IMHO the minimal fix is to turn ValueError into HTTPException (or a subclass thereof); httplib should not raise ValueError just because some server sends a bad response. (The server in question was Apache 2.0.52 running PHP 4.3.9 sending a big hairy error page because the database was down.) Where I'm getting hung up is how far to test this stuff. I have discovered other hypothetical cases of bad chunking that cause httplib to go into an infinite loop or block forever on socket.readline(). Should we worry about those cases as well, despite not having seen them happen in the wild? More annoying, I can reproduce the "block forever" case using a real socket, but not using the StringIO-based FakeSocket class in test_httplib. Anyways, I've cobbled together a crude hack to test_httplib.py that exposes the problem: http://sourceforge.net/tracker/download.php?group_id=5470&atid=105470&file_id=186245&aid=1486335 Feedback welcome. (Fixing the inadvertent ValueError is trivial, so I'm concentrating on getting the tests right first.) Oh yeah, my patch is relative to the 2.4 branch. Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ I don't believe there really IS a GAS SHORTAGE.. I think it's all just a BIG HOAX on the part of the plastic sign salesmen -- to sell more numbers!!

On Tue, 25 Jul 2006, Greg Ward wrote: [...]
Where I'm getting hung up is how far to test this stuff.
Stop when you run out of time ;-)
They have been seen in the wild :-) http://python.org/sf/1411097 The IP address referenced isn't under my control, I don't know if it still provokes the error, but the problem is clear. John

[me, on 25 July]
[John J Lee]
They have been seen in the wild :-)
Thanks -- that was really all the encouragement I needed to keep banging away at this bug. Did you look at the crude attempt at testing for this bug that I hacked into test_httplib.py? I posted it to bug #1486335 here: http://sourceforge.net/tracker/download.php?group_id=5470&atid=105470&file_id=186245&aid=1486335 The idea is simple: put various chunked responses into strings and then feed those strings to HTTPConnection. The trouble is that StringIO does not behave the same as a real socket: where HTTPResponse fails one way reading from a real socket (eg. infinite loop), it fails differently (or not at all) reading from a StringIO. Makes testing with the FakeSocket class in test_httplib.py problematic. Maybe the right way to test httplib is to spawn a server process (thread?) to listen on some random port, feed various HTTP responses at HTTPConnection/HTTPResponse, and see what happens. I'm not sure how to do that portably, though. Well, I'll see if I can whip up a Unix-y solution and see if anyone knows how to make it portable. Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Be careful: sometimes, you're only standing on the shoulders of idiots.

On Sun, 30 Jul 2006, Greg Ward wrote: [...]
There are always ways around unit testing problems, but...
I think adding this kind of functional test is extremely valuable, given that we don't seem to have any for httplib at present (Lib/test/test_httplib.py does not send any network packets). Maybe you could add a basic smoke test that does a simple successful GET, while you're going about adding your chunked encoding regression test(s)? Oh, wait: there is a functional test that uses the network, but it's at the end of httplib.py rather than being part of the test suite, and it follows the "Guru Watches Output" pattern :-) I tried to add a test for urllib2 recently that relied on a python.org URL being set up, but the python.org guys are pretty overworked already and haven't had time to enable that yet -- so I think that simply from that point of view your run-your-own-server approach is better. Why does it need to be unix-y, though? We have SimpleHTTPServer. We're not trying to test the OS, so I don't see a problem with using loopback and a single process -- and that would keep test run times down. Um, except that MS OSes seem a bit odd re the loopback interface -- ISTR that, at least back with NT4, you just didn't get a loopback i/face unless you had an ethernet driver installed (back then, I was on dialup). More unit tests would also be valuable, of course. John

On Tue, Jul 25, 2006 at 10:32:13PM -0400, Greg Ward wrote:
IMNSHO httplib should be fixed and this shouldn't be an error at all as its in the wild and will only show up more and more in the future. Plus file a bug with the apache or php project as appropriate for having a non-RFC compliant response. This is part of the good old network programming addage of being lenient in what you accept. -g

On Tue, 25 Jul 2006, Greg Ward wrote: [...]
Where I'm getting hung up is how far to test this stuff.
Stop when you run out of time ;-)
They have been seen in the wild :-) http://python.org/sf/1411097 The IP address referenced isn't under my control, I don't know if it still provokes the error, but the problem is clear. John

[me, on 25 July]
[John J Lee]
They have been seen in the wild :-)
Thanks -- that was really all the encouragement I needed to keep banging away at this bug. Did you look at the crude attempt at testing for this bug that I hacked into test_httplib.py? I posted it to bug #1486335 here: http://sourceforge.net/tracker/download.php?group_id=5470&atid=105470&file_id=186245&aid=1486335 The idea is simple: put various chunked responses into strings and then feed those strings to HTTPConnection. The trouble is that StringIO does not behave the same as a real socket: where HTTPResponse fails one way reading from a real socket (eg. infinite loop), it fails differently (or not at all) reading from a StringIO. Makes testing with the FakeSocket class in test_httplib.py problematic. Maybe the right way to test httplib is to spawn a server process (thread?) to listen on some random port, feed various HTTP responses at HTTPConnection/HTTPResponse, and see what happens. I'm not sure how to do that portably, though. Well, I'll see if I can whip up a Unix-y solution and see if anyone knows how to make it portable. Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Be careful: sometimes, you're only standing on the shoulders of idiots.

On Sun, 30 Jul 2006, Greg Ward wrote: [...]
There are always ways around unit testing problems, but...
I think adding this kind of functional test is extremely valuable, given that we don't seem to have any for httplib at present (Lib/test/test_httplib.py does not send any network packets). Maybe you could add a basic smoke test that does a simple successful GET, while you're going about adding your chunked encoding regression test(s)? Oh, wait: there is a functional test that uses the network, but it's at the end of httplib.py rather than being part of the test suite, and it follows the "Guru Watches Output" pattern :-) I tried to add a test for urllib2 recently that relied on a python.org URL being set up, but the python.org guys are pretty overworked already and haven't had time to enable that yet -- so I think that simply from that point of view your run-your-own-server approach is better. Why does it need to be unix-y, though? We have SimpleHTTPServer. We're not trying to test the OS, so I don't see a problem with using loopback and a single process -- and that would keep test run times down. Um, except that MS OSes seem a bit odd re the loopback interface -- ISTR that, at least back with NT4, you just didn't get a loopback i/face unless you had an ethernet driver installed (back then, I was on dialup). More unit tests would also be valuable, of course. John

On Tue, Jul 25, 2006 at 10:32:13PM -0400, Greg Ward wrote:
IMNSHO httplib should be fixed and this shouldn't be an error at all as its in the wild and will only show up more and more in the future. Plus file a bug with the apache or php project as appropriate for having a non-RFC compliant response. This is part of the good old network programming addage of being lenient in what you accept. -g
participants (3)
-
Greg Ward
-
Gregory P. Smith
-
John J Lee