[Email-SIG] Thoughts on the general API, and the Header API.

Barry Warsaw barry at python.org
Sun Feb 21 20:07:32 CET 2010


On Feb 20, 2010, at 12:50 AM, R. David Murray wrote:

>> serialize(policy=None)
>> deserialize(policy=None)
>
>I love the idea of policy objects.  I'm clear on what they do for
>serialization.  What do you visualize them doing for deserialization
>(parsing)?

As Glenn points out, they could contain the MIME type registry for producing
more specific instance types.  I also think they'll serve as a container for
any other configuration variables that we'll find convenient for controlling
the parsing process.  E.g. we might enable strict parsing this way.  It's
basically just a hand-wavy way of saying, let's define the API in terms of the
policy object to keep our signatures small and sane (at the cost of course of
making the policy objects huge and insane ;).

>Yes, this was my intent in providing the newline and max_line_length
>parameters, but a policy object is a much cleaner way to do that.
>Especially since we can then provide premade policy objects to support
>common output scenarios such as SMTP and HTTP.

+1

>> It sounds like there's overlap between the encoding/decoding API and the
>> serialize/deserialize API.  Are you thinking along those lines?  Differences
>> in signature could be papered over with the policy objects.
>
>No, I'm thinking of encode/decode as exactly parallel to encode/decode
>on string/bytes.  In my prototype API, for example,  StringHeader
>values are unicode, and do *not* contain any rfc2047 encoded words.
>decoding a BytesHeader decodes the RFC2047 stuff.  Contrawise, encoding
>a StringHeader does the RFC2047 encoding (using whatever charset you
>specify or utf-8 by default).

Make sense, thanks.  Yep, we probably don't need the policy API for that.  It
makes we wonder whether 'serialize' and 'deserialize' are the right names for
functionality we've traditionally called 'parsing' and 'generating'.  But we
can paint that bikeshed later.

>(This means you lose the ability to piece together headers from bits in
>different charsets, but what is the actual use case for that?  And in any
>case, there will be a way to get at the underlying header-translation
>machinery to do it if you really need to.)

The degenerate case is to mix ASCII and non-ASCII header chunks, which I think
is fairly common.  Of course the RFCs allow it, so we have to support it, even
if doing so is via a different API.

>Serializing a StringHeader, in my design, produces *text* not bytes.
>This is to support the use case of using the email package to manipulate
>generic 'name:value // body' formatted data in unicode form (presumably
>utf-8 on disk).
>
>To get something that is RFC compliant, you have to encode the StringMessage
>object (and thus the headers) to a BytesMessage object, and then
>serialize that.  (That's where the incremental encoder may be needed).
>
>The advantage of doing it this way is we support all possible combinations
>of input and output format via two strictly parallel interfaces and
>their encode/decode methods.

This all sounds great.

>Hmm.  It occurs to me now that another possible way to do this would be to
>put the output data format into the policy object.

Indeed, that's an interesting idea.

>Then you could serialize a StringMessage object, and it would know to do the
>string to bytes conversion as it went along doing the serialization.  I don't
>think that would eliminate the need for encode/decode methods: first, that's
>what serialize would use when converting for output, and second, you will
>sometimes want to manipulate, eg, individual header values, and it seems like
>the natural way to do that is something like this:
>
>    mybytesmessage['subject'].decode().value
>
>You don't want to serialize using a to-string policy object, because
>what you want is the decoded value, and you can't do
>
>    mybytesmessage['subject'].value.decode()
>
>because you have to rfc2047 decode....

I'm with ya!

>Hmm.  Here's a thought: could we write an rfc2047 codec?  Then we
>could use that second, more python-intuitive form like this:
>
>    mybytesmessage['subject'].value.decode('mimeheader')
>
>Well, looking at that I'm not sure it's better :(

Yeah.

>Thanks.  The repository (lp:python-email6) contains the beginnings
>of the implementation of the StringHeader and BytesHeader classes.
>I'm currently working on fleshing out the part where it says "this
>is a temporary hack, need to handle folding encoded words", which is,
>needless to say, a bit complicated...I may set that aside for a bit and
>work on the policy object stuff.  Though I also need to put a bunch more
>tests into the test database...

+1
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20100221/4d3ba159/attachment.pgp>


More information about the Email-SIG mailing list