Re: [Python-Dev] Completing the email6 API changes.

31 Aug 2013

      On Sat, 31 Aug 2013 20:37:30 +1000, Steven D'Aprano  wrote:
...
On 31/08/13 15:21, R. David Murray wrote:
...
If you've read my blog (eg: on planet python), you will be aware that
I dedicated August to full time email package development.
[...]
The API looks really nice! Thank you for putting this together.
Thanks.
...
A question comes to mind though:
...
All input strings are unicode, and the library takes care of doing
whatever encoding is required.  When you pull data out of a parsed
message, you get unicode, without having to worry about how to decode
it yourself.
How well does your library cope with emails where the encoding is declared wrongly? Or no encoding declared at all?
It copes as best it can :)  The bad bytes are preserved (unless you
modify a part) but are returned as the "unknown character" in a
string context.  You can get the original bytes out by using the
bytes access interface.  (There are probably some places where how
to do that isn't clear in the current API, but bascially either
you use BytesGenerator or you drop down to a lower level API.)

An attempt is made to interpret "bad bytes" as utf-8, before giving up
and replacing them with the 'unknown character' character.  I'm not 100%
sure that is a good idea.
...
Conveniently, your email is an example of this. Although it contains non-ASCII characters, it is declared as us-ascii:
Oh, yeah, my MUA is a little quirky and I forgot the step that
would have made that correct.  Wanting to rewrite it is one of
the reasons I embarked on this whole email thing a few years
ago :)

--David