Printing UTF-8 mail to terminal
Loris Bennett
loris.bennett at fu-berlin.de
Fri Nov 1 03:11:30 EDT 2024
Cameron Simpson <cs at cskk.id.au> writes:
> On 31Oct2024 16:33, Loris Bennett <loris.bennett at fu-berlin.de> wrote:
>>I have a command-line program which creates an email containing German
>>umlauts. On receiving the mail, my mail client displays the subject and
>>body correctly:
> [...]
>>So far, so good. However, when I use the --verbose option to print
>>the mail to the terminal via
>>
>> if args.verbose:
>> print(mail)
>>
>>I get:
>>
>> Subject: Übungsbetreff
>>
>> Sehr geehrter Herr Dr. Bennett,
>>
>> Dies ist eine =C3=9Cbung.
>>
>>What do I need to do to prevent the body from getting mangled?
>
> That looks to me like quoted-printable. This is an encoding for binary
> transport of text to make it robust against not 8-buit clean
> transports. So your Unicode text is encodings as UTF-8, and then that
> is encoded in quoted-printable for transport through the email system.
As I mentioned, I think the problem is to do with the way the salutation
text provided by the "salutation server" and the mail body from a file
are encoded. This seems to be different.
> Your terminal probably accepts UTF-8 - I imagine other German text
> renders corectly?
Yes, it does.
> You need to get the text and undo the quoted-printable encoding.
>
> If you're using the Python email module to parse (or construct) the
> message as a `Message` object I'd expect that to happen automatically.
I am using
email.message.EmailMessage
as, from the Python documentation
https://docs.python.org/3/library/email.examples.html
I gathered that that is the standard approach.
And you are right that encoding for the actual mail which is received is
automatically sorted out. If I display the raw email in my client I get
the following:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
...
Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
...
Dies ist eine =C3=9Cbung.
I would interpret that as meaning that the subject and body are encoded
in the same way.
The problem just occurs with the unsent string representation printed to
the terminal.
Cheers,
Loris
--
This signature is currently under constuction.
More information about the Python-list
mailing list