[Python-3000] should rfc822 accept text io or binary io?

Guido van Rossum guido at python.org
Tue Aug 7 19:52:26 CEST 2007


On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> > On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> > > Hmmm.  Should we being using the email package to parse HTTP headers?
> > > RFC 2616 says that HTTP headers follow the "same generic format" as
> > > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> > > are arbitrary 8-bit values.  You'd need to parse them differently.
> >
> > I'm confused (and too lazy to read the RFCs). How can you have case
> > insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit
> > values? Assuming they mean it's an ASCII superset, does that mean that
> > HTTP doesn't have case insensitivity for bytes with values > 127?
>
> For HTTP, the header names need to be ASCII, but the values can be
> great > 127.  I haven't read enough of the spec to know which header
> values might include binary data and how you are supposed to interpret
> them.  Assuming that the spec allows OCTET instead of token (which is
> ASCII) for a reason, it suggests that the header values need to be
> bytes.

Bizarre. I'm not aware of any HTTP header that requires *binary*
values. I can imagine though that they may contain *encoded* text and
that they are leaving the encoding up to separate negotiations between
client and server, or another header, or specified explicitly by the
header, etc. It can't be pure binary because it's still subject to the
\r\n line terminator.

> > In general I'm against writing polymorphic code that tries to work for
> > strings as well as bytes, except very small algorithms. For larger
> > amounts of code, you almost always run into the need for literals or
> > hashing or case conversion or other differences (e.g. \n vs. \r\n when
> > doing I/O).
> >
> > I think it's conceptually cleaner to pick a particular type for an API
> > and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm
> > files read/write bytes; text files (io.TextIOBase) read/write strings.
>
> It certainly makes rfc822 tricky to update.  Is it intended to work
> with files or sockets?  In Python 2.x, it works with either.  If we
> have some future email/rfc822/httpheaders library that parses the
> "generic format," will it work with sockets or files or will we have
> two versions?

It never worked with socket object, did it? If it worked with the
objects returned by makefile(), why not use text mode ("r" or "w") as
the mode arg? (Then you can even specify an encoding.) IMO it makes
more sense to treat rfc822 headers as text, since they are for all
intents and purposes meant to be human-readable, and there's case
insensitivity implied.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list