[Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

Victor Stinner victor.stinner at gmail.com
Tue Dec 5 20:01:28 EST 2017


> Annex: Differences between the PEP 538 and the PEP 540
> ======================================================
>
> The PEP 538 uses the "C.UTF-8" locale which is quite new and only
> supported by a few Linux distributions; this locale is not currently
> supported by FreeBSD or macOS for example. This PEP 540 supports all
> operating systems.
>
> The PEP 538 only changes the behaviour for the POSIX locale. While the
> new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can
> be enabled manually for any other locale.
>
> The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any
> non-Python code running in the process is impacted by this change.  This
> PEP is implemented in Python internals and ignores the locale:
> non-Python running in the same process is not aware of the "Python UTF-8
> mode".

The main advantage of the PEP 538 ùover* the PEP 540 is that, for the
POSIX locale, non-Python code running in the same process gets the
UTF-8 encoding.

To be honest, I'm not sure that there is a lot of code in the wild
which uses "text" types like the C type wchar_t* and rely on the
locale encoding. Almost all C library handle data as bytes using the
char* type, like filenames and environment variables.

First I understood that the PEP 538 changed the locale encoding using
an environment variable. But no, it's implemented with
setlocale(LC_CTYPE, "C.UTF-8") which only impacts the current process
and is not inherited by child processes. So I'm not sure anymore that
PEP 538 and PEP 540 are really complementary.

I'm not sure how PyGTK interacts with the PEP 538 for example. Does it
use UTF-8 with the POSIX locale?

Victor


More information about the Python-Dev mailing list