[Python-Dev] [Email-SIG] Dropping bytes "support" in json

Fri Apr 10 05:41:58 CEST 2009

At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
 ...
>So, what I'm really asking is this.  Let's say you agree that there
>are use cases for accessing a header value as either the raw encoded
>bytes or the decoded unicode.  What should this return:
>
> >>> message['Subject']
>
>The raw bytes or the decoded unicode?

That's an easy one:  Subject: is an unstructured header, so it must be
text, thus Unicode.  We're looking at a high-level representation of an
email message, with parsed header fields and a MIME message tree.

>Okay, so you've picked one.  Now how do you spell the other way?

message.get_header_bytes('Subject')

Oh, I see that's what you picked.

>The Message class probably has these explicit methods:
>
> >>> Message.get_header_bytes('Subject')
> >>> Message.get_header_string('Subject')
>
>(or better names... it's late and I'm tired ;).  One of those maps to
>message['Subject'] but which is the more obvious choice?

Structured header fields are more of a problem.  Any header with addresses
should return a list of addresses.  I think the default return type should
depend on the data type.  To get an explicit bytes or string or list of
addresses, be explicit; otherwise, for convenience, return the appropriate
type for the particular header field name.

>Now, setting headers.  Sometimes you have some unicode thing and
>sometimes you have some bytes.  You need to end up with bytes in the
>ASCII range and you'd like to leave the header value unencoded if so.
>But in both cases, you might have bytes or characters outside that
>range, so you need an explicit encoding, defaulting to utf-8 probably.

Never for header fields.  The default is always RFC 2047, unless it isn't,
say for params.

The Message class should create an object of the appropriate subclass of
Header based on the name (or use the existing object, see other
discussion), and that should inspect its argument and DTRT or complain.

>
> >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
> >>> Message.set_header('Subject', b'Some bytes')
>
>One of those maps to
>
> >>> message['Subject'] = ???

The expected data type should depend on the header field.  For Subject:, it
should be bytes to be parsed or verbatim text.  For To:, it should be a
list of addresses or bytes or text to be parsed.

The email package should be pythonic, and not require deep understanding of
dozens of RFCs to use properly.  Users don't need to know about the raw
bytes; that's the whole point of MIME and any email package.  It should be
easy to set header fields with their natural data types, and doing it with
bad data should produce an error.  This may require a bit more care in the
message parser, to always produce a parsed message with defects.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>