[Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

Victor Stinner victor.stinner at gmail.com
Fri Dec 8 10:18:29 EST 2017


2017-12-08 15:01 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:
>> In short, locale coercion and UTF-8 mode will be both enabled by the
>> POSIX locale.
>
> Hm, it is bit surprising because I thought UTF-8 mode is fallback
> of locale coercion when coercion is failed or disabled.

I rewrote the "differences between the PEP 538 and the PEP 540" as a
new section "Relationship with the locale coercion (PEP 538)".

https://www.python.org/dev/peps/pep-0540/#relationship-with-the-locale-coercion-pep-538

"""
Relationship with the locale coercion (PEP 538)
===============================================

The POSIX locale enables the locale coercion (PEP 538) and the UTF-8
mode (PEP 540). When the locale coercion is enabled, enabling the UTF-8
mode has no (additional) effect.

Locale coercion only impacts non-Python code like C libraries, whereas
the Python UTF-8 Mode only impacts Python code: the two PEPs are
complementary.

On platforms where locale coercion is not supported like Centos 7, the
POSIX locale only enables the UTF-8 Mode. In this case, Python code uses
the UTF-8 encoding and ignores the locale encoding, whereas non-Python
code uses the locale encoding which is usually ASCII for the POSIX
locale.

While the UTF-8 Mode is supported on all platforms and can be enabled
with any locale, the locale coercion is not supported by all platforms
and is restricted to the POSIX locale.

The UTF-8 Mode has only an impact on Python child processes when the
``PYTHONUTF8`` environment variable is set to ``1``, whereas the locale
coercion sets the ``LC_CTYPE`` environment variables which impacts all
child processes.

The benefit of the locale coercion approach is that it helps ensure that
encoding handling in binary extension modules and child processes is
consistent with Python's encoding handling. The upside of the UTF-8 Mode
approach is that it allows an embedding application to change the
interpreter's behaviour without having to change the process global
locale settings.
"""

I hope that it's now better explained.

In short, the two PEPs are really complementary.

> As PEP 538 [1], all coercion target locales uses surrogateescape
> for stdin and stdout.
> So, do you mean "UTF-8 mode enabled as flag level, but it has no
> real effects"?

Right and it was a deliberate choice of Nick Coghlan when he designed
the PEP 538, to make sure that the two PEPs are complementary and
"compatible".

Victor


More information about the Python-Dev mailing list