Another 2 to 3 mail encoding problem
Chris Green
cl at isbd.net
Thu Aug 27 04:34:47 EDT 2020
Peter J. Holzer <hjp-python at hjp.at> wrote:
> The problem is that the message contains a '\ufeff' character (byte
> order mark) where email/generator.py expects only ASCII characters.
>
> I see two possible reasons for this:
>
> * The mbox writing code assumes that all messages with non-ascii
> characters are QP or base64 encoded, and some higher layer uses 8bit
> instead.
>
> * A mime-part is declared as charset=us-ascii but contains really
> Unicode characters.
>
> Both reasons are weird.
>
> The first would be an unreasonable assumption (8bit encoding has been
> common since the mid-1990s), but even if the code made that assumption,
> one would expect that other code from the same library honors it.
>
> The second shouldn't be possible: If a message is mis-declared (that
> happens) one would expect that the error happens during parsing, not
> when trying to serialize the already parsed message.
>
> But then you haven't shown where msg comes from. How do you parse the
> message to get "msg"?
>
> Can you construct a minimal test message which triggers the bug?
>
Yes, simply sending myself an E-Mail with (for example) accented
characters triggers the error.
I'm pretty certain my system (and E-Mail in and out, and Usenet news)
handle these correctly as UTF8. E.g.:-
àéçł
It's *only* when I switch the mail delivery to Python 3 that the error
appears.
--
Chris Green
·
More information about the Python-list
mailing list