[Python-Dev] PEP 538 (review round 2): Coercing the legacy C locale to a UTF-8 based locale
songofacandy at gmail.com
Tue May 23 04:38:42 EDT 2017
I read again and I think PEP 538 is mostly ready for accepted,
without waiting PEP 540.
One remaining my concern is setting LANG.
> Setting LANG to C.UTF-8 ensures that even components that only check the LANG fallback for their locale settings will still use C.UTF-8 .
I feel setting only LC_CTYPE making PEP 538 simpler.
Is there any real component using LANG for deciding encoding?
For example, date command refers LC_TIME.
$ LANG=ja_JP.UTF-8 LC_CTYPE=C date
2017年 5月 23日 火曜日 17:31:03 JST
$ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing only LC_CTYPE
2017年 5月 23日 火曜日 17:32:58 JST
$ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing both of LC_CTYPE and LANG
Tue May 23 17:31:10 JST 2017
In this case, coercing only LC_CTYPE has less side-effect.
Would you add example demonstrates how coercing LANG helps people?
On Tue, May 9, 2017 at 8:57 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Hi folks,
> Enough changes have accumulated in PEP 538 since the start of the
> previous thread that it seems sensible to me to start a new thread
> specifically covering the current design (which aims to address all
> the concerns raised in the previous thread).
> I haven't requoted the PEP in full since it's so long, but will
> instead refer readers to the web version:
> I also generated a diff covered the full changes to the PEP text:
> * https://gist.github.com/ncoghlan/1067805fe673b3735ac854e195747493/revisions
> (this is the diff covering the last few days of changes
> Summarising the key technical changes:
> * to make the runtime behaviour independent of whether or not locale
> coercion took place, stdin and stderr now always have
> "surrogateescape" as their error handler in the potential coercion
> target locales. This means Python will behave the same way regardless
> of whether the locale gets set externally (e.g. by a parent Python
> process or a container image definition) or implicitly during CLI
> * for the full locales, the interpreter now sets LC_CTYPE and LANG,
> *not* LC_ALL. This means LC_ALL is once again a full locale override,
> and also means that CPython won't inadvertently interfere with other
> locale categories like LC_MONETARY, LC_NUMERIC, etc
> * the reference implementation has been refactored so the bulk of the
> new code lives in the shared library and is exposed to the linker via
> a couple of underscore prefixed API symbols
> (_Py_LegacyLocaleDetected() and _Py_CoerceLegacyLocale()). While the
> current PEP still keeps them private, it would be straightforward to
> make them public for use in embedding applications if we decided we
> wanted to do so.
> * locale coercion and warnings are now enabled by default on all
> platforms that use the autotools-based build chain - the assumption
> that some platforms didn't need them turned out to be incorrect
> In addition to being updated to cover the above changes, the Rationale
> section of the PEP has also been updated to explain why it doesn't
> propose setting PYTHONIOENCODING, and to walk through some examples of
> the problems with GNU readlines compatibility when the current locale
> isn't set correctly.
> The essential related changes to the reference implementation can be seen here:
> * Always set "surrogateescape" for coercion target locales,
> independently of whether or not coercion occurred:
> * Stop setting LC_ALL:
> (There are also some smaller cleanup commits that can be seen by
> browsing that branch on GitHub)
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
> Python-Dev mailing list
> Python-Dev at python.org
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com
More information about the Python-Dev