[Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)

Nick Coghlan ncoghlan at gmail.com
Thu Jan 12 01:44:00 EST 2017


On 12 January 2017 at 08:15, Victor Stinner <victor.stinner at gmail.com> wrote:
> Hi,
>
> I also implemented my PEP 540, you can now test it! Use the latest
> patch attached to:
>
>    http://bugs.python.org/issue29240
>
>
> I made multiple changes since the first version of my PEP:
>
> * The UTF-8 Strict mode now only uses strict for inputs and outputs:
> it keeps surrogateescape for operating system data. Read the "Use the
> strict error handler for operating system data" alternative for the
> rationale.
>
> * The POSIX locale now enables the UTF-8 mode. See the "Don't modify
> the encoding of the POSIX locale" alternative for the rationale.
>
> * Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc.

Thanks Victor, I really like this version, and the next time I update
PEP 538 I'm going to replace the en_US.UTF-8 fallback in the current
proposal with a dependency on this PEP.

My one comment would be that in the summary tables, "Always works"
isn't the right phrase to describe potentially corrupting text data
instead of throwing an exception :)

Instead, I think it would make sense to retitle that column as
"Exception?" such that:

* the ideal state is "No exception, no mojibake", which is what we'll
now get when assuming (or forcing) UTF-8 is the correct thing to do,
and will also continue to get when the locale is set appropriately
(e.g. when handling GB18030 on Chinese systems)
* the problematic behaviour of earlier Python 3.x versions was "Yes
exception, no mojibake" when it assumed ASCII instead of UTF-8
* the problematic behaviour of Python 2.x in the specific examples
given is "No exception, yes mojibake", and potentially even "Yes
exception, yes mojibake" in cases where the implicit ASCII-based
decoding could be encountered

PEP 538 would then be a follow-on PEP that attempts to resolve the
ASCII locale encoding problem not only for CPython itself, but also
for any other C/C++ components sharing the same process, or launched
in subprocesses that inherit the current environment.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list