[Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)
Barry Warsaw
barry at python.org
Fri Apr 10 05:03:35 CEST 2009
On Apr 9, 2009, at 11:11 PM, glyph at divmod.com wrote:
> I think this is a problematic way to model bytes vs. text; it gives
> text a special relationship to bytes which should be avoided.
>
> IMHO the right way to think about domains like this is a multi-level
> representation. The "low level" representation is always bytes,
> whether your MIME type is text/whatever or application/x-i-dont-know.
This is a really good point, and I really should be clearer when
describing my current thinking (sleep would help :).
> The thing that's "special" about text is that it's a "high level"
> representation that the standard library can know about. But the
> 'email' package ought to support being extended to support other
> types just as well. For example, I want to ask for image/png
> content as PIL.Image objects, not bags of bytes. Of course this
> presupposes some way for PIL itself to get at some bytes, but then
> you need the email module itself to get at the bytes to convert to
> text in much the same way. There also needs to be layering at the
> level of bytes->base64->some different bytes->PIL->Image. There are
> mail clients that will base64-encode unusual encodings so you have
> to do that same layering for text sometimes.
>
> I'm also being somewhat handwavy with talk of "low" and "high" level
> representations; of course there are actually multiple levels beyond
> that. I might want text/x-python content to show up as an AST, but
> the intermediate DOM-parsing representation really wants to operate
> on characters. Similarly for a DOM and text/html content. (Modulo
> the usual encoding-detection weirdness present in parsers.)
When I was talking about supporting text/* content types as strings, I
was definitely thinking about using basically the same plug-in or
higher level or whatever API to do that as you might use to get PIL
images from an image/gif.
> So, as long as there's a crisp definition of what layer of the MIME
> stack one is operating on, I don't think that there's really any
> ambiguity at all about what type you should be getting.
In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy-
center API first, and build things on top of that.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/25c444cd/attachment.pgp>
More information about the Python-Dev
mailing list