[Python-Dev] PEP 538 (review round 2): Coercing the legacy C locale to a UTF-8 based locale

INADA Naoki songofacandy at gmail.com
Tue May 23 04:38:42 EDT 2017


Hi, Nick.

I read again and I think PEP 538 is mostly ready for accepted,
without waiting PEP 540.

One remaining my concern is setting LANG.

> Setting LANG to C.UTF-8 ensures that even components that only check the LANG fallback for their locale settings will still use C.UTF-8 .
https://www.python.org/dev/peps/pep-0538/#setting-both-lc-ctype-lang-for-utf-8-locale-coercion

I feel setting only LC_CTYPE making PEP 538 simpler.
Is there any real component using LANG for deciding encoding?

For example, date command refers LC_TIME.

$ LANG=ja_JP.UTF-8 LC_CTYPE=C date
2017年  5月 23日 火曜日 17:31:03 JST

$ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing only LC_CTYPE
2017年  5月 23日 火曜日 17:32:58 JST

$ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing both of LC_CTYPE and LANG
Tue May 23 17:31:10 JST 2017

In this case, coercing only LC_CTYPE has less side-effect.

Would you add example demonstrates how coercing LANG helps people?


On Tue, May 9, 2017 at 8:57 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Hi folks,
>
> Enough changes have accumulated in PEP 538 since the start of the
> previous thread that it seems sensible to me to start a new thread
> specifically covering the current design (which aims to address all
> the concerns raised in the previous thread).
>
> I haven't requoted the PEP in full since it's so long, but will
> instead refer readers to the web version:
> https://www.python.org/dev/peps/pep-0538/
>
> I also generated a diff covered the full changes to the PEP text:
>
> * https://gist.github.com/ncoghlan/1067805fe673b3735ac854e195747493/revisions
> (this is the diff covering the last few days of changes
>
> Summarising the key technical changes:
>
> * to make the runtime behaviour independent of whether or not locale
> coercion took place, stdin and stderr now always have
> "surrogateescape" as their error handler in the potential coercion
> target locales. This means Python will behave the same way regardless
> of whether the locale gets set externally (e.g. by a parent Python
> process or a container image definition) or implicitly during CLI
> startup
> * for the full locales, the interpreter now sets LC_CTYPE and LANG,
> *not* LC_ALL. This means LC_ALL is once again a full locale override,
> and also means that CPython won't inadvertently interfere with other
> locale categories like LC_MONETARY, LC_NUMERIC, etc
> * the reference implementation has been refactored so the bulk of the
> new code lives in the shared library and is exposed to the linker via
> a couple of underscore prefixed API symbols
> (_Py_LegacyLocaleDetected() and _Py_CoerceLegacyLocale()). While the
> current PEP still keeps them private, it would be straightforward to
> make them public for use in embedding applications if we decided we
> wanted to do so.
> * locale coercion and warnings are now enabled by default on all
> platforms that use the autotools-based build chain - the assumption
> that some platforms didn't need them turned out to be incorrect
>
> In addition to being updated to cover the above changes, the Rationale
> section of the PEP has also been updated to explain why it doesn't
> propose setting PYTHONIOENCODING, and to walk through some examples of
> the problems with GNU readlines compatibility when the current locale
> isn't set correctly.
>
> The essential related changes to the reference implementation can be seen here:
>
> * Always set "surrogateescape" for coercion target locales,
> independently of whether or not coercion occurred:
> https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5
> * Stop setting LC_ALL:
> https://github.com/python/peps/commit/2f530ce0d1fd24835ac0c6f984f40db70482a18f
>
> (There are also some smaller cleanup commits that can be seen by
> browsing that branch on GitHub)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com


More information about the Python-Dev mailing list