[Python-Dev] Completing the email6 API changes.

Sat Aug 31 16:13:49 CEST 2013

On Sat, 31 Aug 2013 20:37:30 +1000, Steven D'Aprano <steve at pearwood.info> wrote:
> On 31/08/13 15:21, R. David Murray wrote:
> > If you've read my blog (eg: on planet python), you will be aware that
> > I dedicated August to full time email package development.
> [...]
> 
> 
> The API looks really nice! Thank you for putting this together.

Thanks.

> A question comes to mind though:
> 
> > All input strings are unicode, and the library takes care of doing
> > whatever encoding is required.  When you pull data out of a parsed
> > message, you get unicode, without having to worry about how to decode
> > it yourself.
> 
> How well does your library cope with emails where the encoding is declared wrongly? Or no encoding declared at all?

It copes as best it can :)  The bad bytes are preserved (unless you
modify a part) but are returned as the "unknown character" in a
string context.  You can get the original bytes out by using the
bytes access interface.  (There are probably some places where how
to do that isn't clear in the current API, but bascially either
you use BytesGenerator or you drop down to a lower level API.)

An attempt is made to interpret "bad bytes" as utf-8, before giving up
and replacing them with the 'unknown character' character.  I'm not 100%
sure that is a good idea.

> Conveniently, your email is an example of this. Although it contains non-ASCII characters, it is declared as us-ascii:

Oh, yeah, my MUA is a little quirky and I forgot the step that
would have made that correct.  Wanting to rewrite it is one of
the reasons I embarked on this whole email thing a few years
ago :)

--David