[Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

Nick Coghlan ncoghlan at gmail.com
Tue Dec 5 21:31:12 EST 2017


On 6 December 2017 at 11:01, Victor Stinner <victor.stinner at gmail.com> wrote:
>> Annex: Differences between the PEP 538 and the PEP 540
>> ======================================================
>>
>> The PEP 538 uses the "C.UTF-8" locale which is quite new and only
>> supported by a few Linux distributions; this locale is not currently
>> supported by FreeBSD or macOS for example. This PEP 540 supports all
>> operating systems.
>>
>> The PEP 538 only changes the behaviour for the POSIX locale. While the
>> new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can
>> be enabled manually for any other locale.
>>
>> The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any
>> non-Python code running in the process is impacted by this change.  This
>> PEP is implemented in Python internals and ignores the locale:
>> non-Python running in the same process is not aware of the "Python UTF-8
>> mode".

I submitted a PR to reword this part: https://github.com/python/peps/pull/493

> The main advantage of the PEP 538 ùover* the PEP 540 is that, for the
> POSIX locale, non-Python code running in the same process gets the
> UTF-8 encoding.
>
> To be honest, I'm not sure that there is a lot of code in the wild
> which uses "text" types like the C type wchar_t* and rely on the
> locale encoding. Almost all C library handle data as bytes using the
> char* type, like filenames and environment variables.

At the very least, GNU readline breaks if you don't change the locale
setting: https://www.python.org/dev/peps/pep-0538/#considering-locale-coercion-independently-of-utf-8-mode

Given that we found an example of this directly in the standard
library, I assume that there are plenty more in third party extension
modules (especially once we take C++ extensions into account, not just
C ones).

> First I understood that the PEP 538 changed the locale encoding using
> an environment variable. But no, it's implemented with
> setlocale(LC_CTYPE, "C.UTF-8") which only impacts the current process
> and is not inherited by child processes. So I'm not sure anymore that
> PEP 538 and PEP 540 are really complementary.

It sets the LC_CTYPE environment variable as well:
https://www.python.org/dev/peps/pep-0538/#explicitly-setting-lc-ctype-for-utf-8-locale-coercion

The relevant code is in _coerce_default_locale_settings (currently at
https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L448)

> I'm not sure how PyGTK interacts with the PEP 538 for example. Does it
> use UTF-8 with the POSIX locale?

Desktop environments aim not to get into this situation in the first
place by ensuring they're using a more appropriate locale :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list