[Python-3000] should rfc822 accept text io or binary io?
jeremy at alum.mit.edu
Mon Aug 6 15:30:13 CEST 2007
This is a fairly specific question, but it gets at a more general
issue I don't fully understand.
I recently updated httplib and urllib so that they work on the struni
branch. A recurring problem with these libraries is that they call
methods like strip() and split(). On a string object, calling these
methods with no arguments means strip/split whitespace. The bytes
object has no corresponding default arguments; whitespace may not be
well-defined for bytes. (Or is it?)
In general, the approach was to read data as bytes off the socket and
convert header lines to iso-8859-1 before processing them.
test_urllib2_localnet still fails. One of the problems is that
BaseHTTPServer doesn't process HTTP responses correctly. Like
httplib, it converts the HTTP status line to iso-8859-1. But it
parses the rest of the headers by calling mimetools.Message, which is
really rfc822.Message. The header lines of an RFC 822 message
(really, RFC 2822) are ascii, so it should be easy to do the
conversion. rfc822.Message assumes it is reading from a text file and
that readline() returns a string.
So the short question is: Should rfc822.Message require a text io
object or a binary io object? Or should it except either (via some
new constructor or extra arguments to the existing constructor)? I'm
not sure how to design an API for bytes vs strings. The API used to
be equally well suited for reading from a file or a socket, but they
don't behave the same way anymore.
More information about the Python-3000