[Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)
glyph at divmod.com
glyph at divmod.com
Fri Apr 10 05:11:51 CEST 2009
On 02:26 am, barry at python.org wrote:
>There are really two ways to look at an email message. It's either an
>unstructured blob of bytes, or it's a structured tree of objects.
>Those objects have headers and payload. The payload can be of any
>type, though I think it generally breaks down into "strings" for text/
>* types and bytes for anything else (not counting multiparts).
I think this is a problematic way to model bytes vs. text; it gives text
a special relationship to bytes which should be avoided.
IMHO the right way to think about domains like this is a multi-level
representation. The "low level" representation is always bytes, whether
your MIME type is text/whatever or application/x-i-dont-know.
The thing that's "special" about text is that it's a "high level"
representation that the standard library can know about. But the
'email' package ought to support being extended to support other types
just as well. For example, I want to ask for image/png content as
PIL.Image objects, not bags of bytes. Of course this presupposes some
way for PIL itself to get at some bytes, but then you need the email
module itself to get at the bytes to convert to text in much the same
way. There also needs to be layering at the level of
bytes->base64->some different bytes->PIL->Image. There are mail clients
that will base64-encode unusual encodings so you have to do that same
layering for text sometimes.
I'm also being somewhat handwavy with talk of "low" and "high" level
representations; of course there are actually multiple levels beyond
that. I might want text/x-python content to show up as an AST, but the
intermediate DOM-parsing representation really wants to operate on
characters. Similarly for a DOM and text/html content. (Modulo the
usual encoding-detection weirdness present in parsers.)
So, as long as there's a crisp definition of what layer of the MIME
stack one is operating on, I don't think that there's really any
ambiguity at all about what type you should be getting.
More information about the Python-Dev