On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote:
At 22:38 -0400 04/09/2009, Barry Warsaw wrote: ...
So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return:
The raw bytes or the decoded unicode?
That's an easy one: Subject: is an unstructured header, so it must be
text, thus Unicode. We're looking at a high-level representation of
an email message, with parsed header fields and a MIME message tree.
I'm liking Glyph's suggestion here. We'll probably have to support
the message['Subject'] API for backward compatibility, but in that
case it really should be a bytes API.
(or better names... it's late and I'm tired ;). One of those maps to message['Subject'] but which is the more obvious choice?
Structured header fields are more of a problem. Any header with
addresses should return a list of addresses. I think the default return type
should depend on the data type. To get an explicit bytes or string or list
of addresses, be explicit; otherwise, for convenience, return the
appropriate type for the particular header field name.
Yes, structured headers are trickier. In a separate message, James
Knight makes some excellent points, which I agree with. However the
email package obviously cannot support every time of structured header
possible. It must support this through extensibility.
The obvious way is through inheritance (i.e. subclasses of Header),
but in my experience, using inheritance of the Message class really
doesn't work very well. You need to pass around factories to parsing
functions and your application tends to have its own hierarchy of
subclasses for whatever extra things it needs. ISTM that subclassing
is simply not the right pattern to support extensibility in the
Message objects or Header objects. Yes, this leads me to think that
all the MIME* subclasses are essentially /wrong/.
Having said all that, the email package must support structured
headers. Look at the insanity which is the current folding whitespace
splitting and the impossibility of the current code to do the right
thing for say Subject headers and Received headers, and you begin to
see why it must be possible to extend this stuff.
headers. Sometimes you have some unicode thing and
sometimes you have some bytes. You need to end up with bytes in the
ASCII range and you'd like to leave the header value unencoded if so.
But in both cases, you might have bytes or characters outside that
range, so you need an explicit encoding, defaulting to utf-8
Never for header fields. The default is always RFC 2047, unless it
isn't, say for params.
The Message class should create an object of the appropriate
subclass of Header based on the name (or use the existing object, see other discussion), and that should inspect its argument and DTRT or
Message.set_header('Subject', 'Some text', encoding='utf-8') Message.set_header('Subject', b'Some bytes')
One of those maps to
message['Subject'] = ???
The expected data type should depend on the header field.
Subject:, it should be bytes to be parsed or verbatim text. For To:, it should
be a list of addresses or bytes or text to be parsed.
At a higher level, yes. At the low level, it has to be bytes.
The email package should be pythonic, and not require
understanding of dozens of RFCs to use properly. Users don't need to know about the
raw bytes; that's the whole point of MIME and any email package. It
should be easy to set header fields with their natural data types, and doing
it with bad data should produce an error. This may require a bit more care
in the message parser, to always produce a parsed message with defects.
I agree that we should have some higher level APIs that make it easy
to compose email messages, and probably easy-ish to parse a byte
stream into an email message tree. But we can't build those without
the lower level raw support. I'm also convinced that this lower level
will be the domain of those crazy enough to have the RFCs tattooed to
the back of their eyelids.