[Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

glyph at divmod.com glyph at divmod.com
Fri Apr 10 05:11:51 CEST 2009

On 02:26 am, barry at python.org wrote:
>There are really two ways to look at an email message.  It's either an 
>unstructured blob of bytes, or it's a structured tree of objects. 
>Those objects have headers and payload.  The payload can be of any 
>type, though I think it generally breaks down into "strings" for text/ 
>* types and bytes for anything else (not counting multiparts).

I think this is a problematic way to model bytes vs. text; it gives text 
a special relationship to bytes which should be avoided.

IMHO the right way to think about domains like this is a multi-level 
representation.  The "low level" representation is always bytes, whether 
your MIME type is text/whatever or application/x-i-dont-know.

The thing that's "special" about text is that it's a "high level" 
representation that the standard library can know about.  But the 
'email' package ought to support being extended to support other types 
just as well.  For example, I want to ask for image/png content as 
PIL.Image objects, not bags of bytes.  Of course this presupposes some 
way for PIL itself to get at some bytes, but then you need the email 
module itself to get at the bytes to convert to text in much the same 
way.  There also needs to be layering at the level of 
bytes->base64->some different bytes->PIL->Image.  There are mail clients 
that will base64-encode unusual encodings so you have to do that same 
layering for text sometimes.

I'm also being somewhat handwavy with talk of "low" and "high" level 
representations; of course there are actually multiple levels beyond 
that.  I might want text/x-python content to show up as an AST, but the 
intermediate DOM-parsing representation really wants to operate on 
characters.  Similarly for a DOM and text/html content.  (Modulo the 
usual encoding-detection weirdness present in parsers.)

So, as long as there's a crisp definition of what layer of the MIME 
stack one is operating on, I don't think that there's really any 
ambiguity at all about what type you should be getting.

More information about the Python-Dev mailing list