[New-bugs-announce] [issue24363] httplib fails to handle semivalid HTTP headers
Michael Del Monte
report at bugs.python.org
Tue Jun 2 17:06:03 CEST 2015
New submission from Michael Del Monte:
Initially reported at https://github.com/kennethreitz/requests/issues/2622
Closely related to http://bugs.python.org/issue19996
An HTTP response with an invalid header line that contains non-blank characters but *no* colon (contrast http://bugs.python.org/issue19996 in which it contained a colon as the first character) causes the same behavior.
httplib.HTTPMessage.readheaders() oddly does not appear even to attempt to follow RFC 2616, which requires the header to terminate with a blank line. The invalid header line, which admittedly also breaks RFC 2616, is at least non-blank and should not terminate the header. Yet readheaders() takes it as an indicator that the header is over and then fails properly to process the rest of the response.
The problem is exacerbated by a chunked encoding, which will not be properly received if the encoding header is not seen because readheaders() terminates early. An example (why are banks always the miscreants here?) is:
p = response.get("http://www.merrickbank.com/")
My recommended fix would be to insert these lines at httplib:327
# continue reading headers on non-blank lines
elif not len(line.strip()):
# break only on blank lines
This would cause readheaders() to terminate only on a non-blank non-header non-comment line, in accordance with RFC 2616.
components: Library (Lib)
title: httplib fails to handle semivalid HTTP headers
versions: Python 2.7
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce