[Python-Dev] iso-2022 and issue 7472: question for the experts
Stephen J. Turnbull
turnbull at sk.tsukuba.ac.jp
Wed Apr 7 21:22:02 CEST 2010
R. David Murray writes:
> A long time ago (in a galaxy far far...no, wrong show)
>
> Er, as I was saying, a long time ago Barry applied a patch to
> email that went more or less like this:
>
> ndex: email/Encoders.py
> ===================================================================
> --- email/Encoders.py (revision 35918)
> +++ email/Encoders.py (revision 35919)
> @@ -84,7 +83,13 @@
> try:
> orig.encode('ascii')
> except UnicodeError:
> - msg['Content-Transfer-Encoding'] = '8bit'
> + # iso-2022-* is non-ASCII but still 7-bit
This comment may be inaccurate. The ISO 2022 family includes what are
normally "8bit" encodings such as the EUC family and ISO 8859. I
don't know whether there are any IANA-registered 8bit charsets with
names that start with 'iso-2022-', and AFAIK there are none in Python.
(There is an 'iso-2022-8' encoding in Emacs, though.) Still, I'd be
more comfortable with an explicit list than with the
.startswith('iso-2022-') idiom.
> + charset = msg.get_charset()
> + output_cset = charset and charset.output_charset
> + if output_cset and output_cset.lower().startswith('iso-2202-'):
> + msg['Content-Transfer-Encoding'] = '7bit'
> + else:
> + msg['Content-Transfer-Encoding'] = '8bit'
> else:
> msg['Content-Transfer-Encoding'] = '7bit'
> Reading the standards, it looks to me like either the ISO-2022
> input will be 7-bit, and the except will not trigger, or it will be
> invalid, because 8bit, and so should be set to 8bit just like all
> the other cases where there's invalid 8bit data. So I think this
> patch should just be reverted.
I have nothing to add to what Martin said about the basic analysis.
It would be possible to just unconditionally set the
Content-Transfer-Encoding to 8bit, although that may violate a SHOULD
in the MIME standard.
More information about the Python-Dev
mailing list