New GitHub issue #94606 from sidney:<br>
<hr>
<pre>
email.message get_payload gets a UnicodeEncodeError if the message body contains a line that has either:
a Unicode surrogate code point that is valid for surrogateescape encoding (U-DC80 through U-DCFF) and a one byte UTF-8 character greater than 127
OR
a Unicode surrogate character that is not valid for surrogateescape encoding whose byte value is not valid UTF-8
Here is a minimal code example with one of the cases commented out
```
from email import message_from_string
from email.message import EmailMessage
m = message_from_string("surrogate char \udcc3 and 8-bit utf-8 ë on same line")
# m = message_from_string("surrogate char \udfff does it by itself")
payload = m.get_payload(decode=True)
```
On my python 3.10.5 on macOS this produces:
```
Traceback (most recent call last):
File "/Users/sidney/tmp/./test5.py", line 8, in <module>
payload = m.get_payload(decode=True)
File "/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/email/message.py", line 264, in get_payload
bpayload = payload.encode('ascii', 'surrogateescape')
UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' in position 33: ordinal not in range(128)
```
This was tested on python 3.10.5 on macOS, however I tracked it down based on a report in the wild that was running python 3.8 on Ubuntu 20.04 processing actual emails
</pre>
<hr>
<a href="https://github.com/python/cpython/issues/94606">View on GitHub</a>
<p>Labels: type-bug</p>
<p>Assignee: </p>