[New-bugs-announce] [issue19662] smtpd.py should not decode utf-8

Leslie P. Polzer report at bugs.python.org
Wed Nov 20 11:51:43 CET 2013


New submission from Leslie P. Polzer:

http://hg.python.org/cpython/file/3.3/Lib/smtpd.py#l289

as of now decodes incoming bytes as UTF-8.

An SMTP server must not attempt to interpret characters beyond ASCII, however. Originally mail servers were not 8-bit clean, meaning they would only guarantee the lower 7 bits of each octet to be preserved.
However even then they were not expected to choke on any input because of attempts to decode it into a specific extended charset. Whenever a mail server does not need to interpret data (like base64-encoded auth information) it is simply left alone and passed through.

I am not aware of the reasons that caused the current state, but to correct this behavior and make it possible to support the 8BITMIME feature I suggest decoding received bytes as latin1, leaving it to the user to reinterpret it as UTF-8 or whatever charset they need. Any other simple extended encoding could be used for this, but latin1 is the default in asynchat.

The documentation should also mention charset handling. I'll be happy to submit a patch for both code and docs.

----------
components: Library (Lib)
messages: 203467
nosy: skypher
priority: normal
severity: normal
status: open
title: smtpd.py should not decode utf-8
type: enhancement
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19662>
_______________________________________


More information about the New-bugs-announce mailing list