Another 2 to 3 mail encoding problem
Barry
barry at barrys-emacs.org
Thu Aug 27 12:16:51 EDT 2020
> On 27 Aug 2020, at 10:40, Chris Green <cl at isbd.net> wrote:
>
> Karsten Hilbert <Karsten.Hilbert at gmx.net> wrote:
>>> Terry Reedy <tjreedy at udel.edu> wrote:
>>>>> On 8/26/2020 11:10 AM, Chris Green wrote:
>>>>>
>>>>>> I have a simple[ish] local mbox mail delivery module as follows:-
>>>>> ...
>>>>>> It has run faultlessly for many years under Python 2. I've now
>>>>>> changed the calling program to Python 3 and while it handles most
>>>>>> E-Mail OK I have just got the following error:-
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>> File "/home/chris/.mutt/bin/filter.py", line 102, in <module>
>>>>>> mailLib.deliverMboxMsg(dest, msg, log)
>>>>> ...
>>>>>> File "/usr/lib/python3.8/email/generator.py", line 406, in write
>>>>>> self._fp.write(s.encode('ascii', 'surrogateescape'))
>>>>>> UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in
>>>>> position 4: ordinal not in range(128)
I would guess the fix is do s.encode(‘utf-8’).
You might need to add a header to say that you are using utf-8 to the email/mime-part.
If you do that does your code work?
Barry
>>>>>
>>>>> '\ufeff' is the Unicode byte-order mark. It should not be present in an
>>>>> ascii-only 3.x string and would not normally be present in general
>>>>> unicode except in messages like this that talk about it. Read about it,
>>>>> for instance, at
>>>>> https://en.wikipedia.org/wiki/Byte_order_mark
>>>>>
>>>>> I would catch the error and print part or all of string s to see what is
>>>>> going on with this particular message. Does it have other non-ascii chars?
>>>>>
>>> I can provoke the error simply by sending myself an E-Mail with
>>> accented characters in it. I'm pretty sure my Linux system is set up
>>> correctly for UTF8 characters, I certainly seem to be able to send and
>>> receive these to others and I even get to see messages in other
>>> scripts such as arabic, chinese, etc.
>>>
>>> The code above works perfectly in Python 2 delivering messages with
>>> accented (and other extended) characters with no problems at all.
>>> Sending myself E-Mails with accented characters works OK with the code
>>> running under Python 2.
>>>
>>> While an E-Mail body possibly *shouldn't* have non-ASCII characters in
>>> it one must be able to handle them without errors. In fact haven't
>>> the RFCs changed such that the message body should be 8-bit clean?
>>> Anyway I think the Python 3 mail handling libraries need to be able to
>>> pass extended characters through without errors.
>>
>> Well, '\ufeff' is not a *character* at all in much of any
>> sense of that word in unicode.
>>
>> It's a marker. Whatever puts it into the stream is wrong. I guess the
>> best one can (and should) do is to catch the exception and dump
>> the offending stream somewhere binary-capable and pass on a notice. What
>> you are receiving there very much isn't a (well-formed) e-mail message.
>>
>> I would then attempt to backwards-crawl the delivery chain to
>> find out where it came from.
>>
> The error seems to occur with any non-7-bit-ASCII, e.g. my accented
> characters gave:-
>
> File "/usr/lib/python3.8/email/generator.py", line 406, in write
> self._fp.write(s.encode('ascii', 'surrogateescape'))
> UnicodeEncodeError: 'ascii' codec can't encode character
> '\u2019' in position 34: ordinal not in
> range(128)
>
> It just happened that the first example was an escape.
>
> --
> Chris Green
> ·
> --
> https://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list