[Python-Dev] [Email-SIG] Dropping bytes "support" in json
Stephen J. Turnbull
turnbull at sk.tsukuba.ac.jp
Fri Apr 10 07:22:04 CEST 2009
Barry Warsaw writes:
> There are really two ways to look at an email message. It's either an
> unstructured blob of bytes, or it's a structured tree of objects.
Indeed!
> Those objects have headers and payload. The payload can be of any
> type, though I think it generally breaks down into "strings" for text/
> * types and bytes for anything else (not counting multiparts).
*sigh* Why are you back-tracking?
The payload should be of an appropriate *object* type. Atomic object
types will have their content stored as string or bytes [nb I use
Python 3 terminology throughout]. Composite types (multipart/*) won't
need string or bytes attributes AFAICS.
Start by implementing the application/octet-stream and
text/plain;charset=utf-8 object types, of course.
> It does seem to make sense to think about headers as text header names
> and text header values.
I disagree. IMHO, structured header types should have object values,
and something like
message['to'] = "Barry 'da FLUFL' Warsaw <barry at python.org>"
should be smart enough to detect that it's a string and attempt to
(flexibly) parse it into a fullname and a mailbox adding escapes, etc.
Whether these should be structured objects or they can be strings or
bytes, I'm not sure (probably bytes, not strings, though -- see next
exampl). OTOH
message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <barry at python.org>'''
should assume that the client knows what they are doing, and should
parse it strictly (and I mean "be a real bastard", eg, raise an
exception on any non-ASCII octet), merely dividing it into fullname
and mailbox, and caching the bytes for later insertion in a
wire-format message.
> In that case, I think you want the values as unicodes, and probably
> the headers as unicodes containing only ASCII. So your table would be
> strings in both cases. OTOH, maybe your application cares about the
> raw underlying encoded data, in which case the header names are
> probably still strings of ASCII-ish unicodes and the values are
> bytes. It's this distinction (and I think the competing use cases)
> that make a true Python 3.x API for email more complicated.
I don't see why you can't have the email API be specific, with
message['to'] always returning a structured_header object (or maybe
even more specifically an address_header object), and methods like
message['to'].build_header_as_text()
which returns
"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""
and
message['to'].build_header_in_wire_format()
which returns
b"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""
Then have email.textview.Message and email.wireview.Message which
provide a simple interface where message['to'] would invoke
.build_header_as_text() and .build_header_in_wire_format()
respectively.
> Thinking about this stuff makes me nostalgic for the sloppy happy days
> of Python 2.x
Er, yeah.
Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs,
More information about the Python-Dev
mailing list