[Email-SIG] Thoughts on the general API, and the Header API.

R. David Murray rdmurray at bitdance.com
Mon Feb 22 05:47:54 CET 2010


On Sun, 21 Feb 2010 14:07:32 -0500, Barry Warsaw <barry at python.org> wrote:
> On Feb 20, 2010, at 12:50 AM, R. David Murray wrote:
> 
> >> serialize(policy=None)
> >> deserialize(policy=None)
> >
> >I love the idea of policy objects.  I'm clear on what they do for
> >serialization.  What do you visualize them doing for deserialization
> >(parsing)?
> 
> As Glenn points out, they could contain the MIME type registry for producing
> more specific instance types.  I also think they'll serve as a container for

Arg.  I was of course writing that email late at night and sleep
deprived or I'd have noticed that :)

> any other configuration variables that we'll find convenient for controlling
> the parsing process.  E.g. we might enable strict parsing this way.  It's
> basically just a hand-wavy way of saying, let's define the API in terms of
> the policy object to keep our signatures small and sane (at the cost of course
> of making the policy objects huge and insane ;).

Sounds good.

> Make sense, thanks.  Yep, we probably don't need the policy API for that.  It
> makes we wonder whether 'serialize' and 'deserialize' are the right names for
> functionality we've traditionally called 'parsing' and 'generating'.  But we
> can paint that bikeshed later.

Yes.  I'm thinking if serialization as the replacement for generating,
with the idea that the 'generator' api at the top level will be
convenience functions wrapped around the serialization API.  But we can
deal with that when I get up to that level.

> >(This means you lose the ability to piece together headers from bits in
> >different charsets, but what is the actual use case for that?  And in any
> >case, there will be a way to get at the underlying header-translation
> >machinery to do it if you really need to.)
> 
> The degenerate case is to mix ASCII and non-ASCII header chunks, which I think
> is fairly common.  Of course the RFCs allow it, so we have to support it, even
> if doing so is via a different API.

I'd better talk about what I'm thinking about in that regard.  My notion
is that the serializer will actually try to minimize the amount of
encoded text (modulo caring about how long the encoded bits are when
the RFC2047 chrome is included) and putting anything that can be put in
ascii in ascii.  But also using us-ascii encoded words to do things like
wrap tokens that won't fit in 77 chars and even to preserve whitespace
in unstructured headers in certain situations (this bit would be the
more controversial bit, I think).  So combining ascii chunks and chunks
encoded in the charset specified to the encode method happens naturally.
You could also modify the value of a BytesHeader, stuffing into it ascii
or encoded words created 'manually' using a low level function I plan
to expose.  So I think that's the 'different API', and I think it fits
in pretty logically, I think.

If you want to control *exactly* how the encoded words appear, then I
think it would be reasonable to also require that you do your own header
wrapping, which means using the low level tools to build the encoded
words, putting in the appropriate folding yourself, adding the fieldname
on the front, passing the result to BytesHeader.from_full_header,
and using a policy that says to use the raw header data.

--David


More information about the Email-SIG mailing list