[Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

Fri Apr 10 05:03:35 CEST 2009

On Apr 9, 2009, at 11:11 PM, glyph at divmod.com wrote:

> I think this is a problematic way to model bytes vs. text; it gives  
> text a special relationship to bytes which should be avoided.
>
> IMHO the right way to think about domains like this is a multi-level  
> representation.  The "low level" representation is always bytes,  
> whether your MIME type is text/whatever or application/x-i-dont-know.

This is a really good point, and I really should be clearer when  
describing my current thinking (sleep would help :).

> The thing that's "special" about text is that it's a "high level"  
> representation that the standard library can know about.  But the  
> 'email' package ought to support being extended to support other  
> types just as well.  For example, I want to ask for image/png  
> content as PIL.Image objects, not bags of bytes.  Of course this  
> presupposes some way for PIL itself to get at some bytes, but then  
> you need the email module itself to get at the bytes to convert to  
> text in much the same way.  There also needs to be layering at the  
> level of bytes->base64->some different bytes->PIL->Image.  There are  
> mail clients that will base64-encode unusual encodings so you have  
> to do that same layering for text sometimes.
>
> I'm also being somewhat handwavy with talk of "low" and "high" level  
> representations; of course there are actually multiple levels beyond  
> that.  I might want text/x-python content to show up as an AST, but  
> the intermediate DOM-parsing representation really wants to operate  
> on characters.  Similarly for a DOM and text/html content.  (Modulo  
> the usual encoding-detection weirdness present in parsers.)

When I was talking about supporting text/* content types as strings, I  
was definitely thinking about using basically the same plug-in or  
higher level or whatever API to do that as you might use to get PIL  
images from an image/gif.

> So, as long as there's a crisp definition of what layer of the MIME  
> stack one is operating on, I don't think that there's really any  
> ambiguity at all about what type you should be getting.

In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy- 
center API first, and build things on top of that.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/25c444cd/attachment.pgp>