[Email-SIG] email package status in 3.X?
Matthew Dixon Cowles
matt at mondoinfo.com
Mon May 10 21:51:46 CEST 2010
Mark,
> I realize everybody on this list probably knows this already,
> but email in 3.X not only doesn't support the Unicode/bytes
> dichotomy, it was also broken by it.
Yes, it's a shame that it has worked out that way. I think it's
because email is an almost uniquely hard problem when you try to make
a sharp distinction between text and bytes.
When you receive an email, what have you got? It's supposed to be
ASCII, but of course it often isn't. What character set should you
assume that those eight-bit characters are in? The program that's
using the module probably does want to try to guess since it probably
wants to make as much sense as possible out of an incorrectly formed
email. The same goes for mis-specified encodings, both in headers and
in MIME parts.
So you probably need to provide multiple ways of getting at headers
and the MIME parts that claim to be text. You'll want to be able to
get at the original data (probably as bytes for safety) and the text
version if one can be created.
And so forth.
Happily passing eight-bit strings around with the assumption that the
user would make the correct sense of them mapped onto email really
well. Trying to make a strict distinction between bytes and text
turns out to be a bit of a mess in this context.
But you probably already knew all that as well.
Regards,
Matt
More information about the Email-SIG
mailing list