Parsing HTTP messages

Chris Gray cpgray at library.uwaterloo.ca
Wed Nov 19 16:23:20 EST 2003


My initial question is, what Python library do I use to parse HTTP
messages?

Trying to use the "email" module has led me to another question and I'm
not even sure where to ask this second question.

When I parse an HTTP request using the email module, I get a field name of
"GET http:", which isn't a field name or part of header at all but part of
the "start-line" (request line) of the HTTP request.

Checking the HTTP/1.1 spec (RFC 1616) I find that HTTP messages use the
generic message format of RFC 822 (obsoleted by RFC 2822) and that:

"Both types of message consist of a start-line, one or more header fields
(also known as 'headers'), an empty line (i.e., a line with nothing
preceding the CRLF) indicating the end of the header fields, and an
optional message-body."

But my understanding of RFC (2)822 is that there is no such thing as a
"start-line" in that format, and so the "email" module is right in trying
to treat the HTTP "start-line" as a header and that that start-line should
be stripped out before feeding it the remainder of the message which _is_
in (2)822 format.

Am I (don't laugh) missing something here?

Chris Gray




More information about the Python-list mailing list