On 6 May 2017 at 18:33, Nick Coghlan
On 6 May 2017 at 18:00, Nick Coghlan
wrote: On 5 March 2017 at 17:50, Nick Coghlan
wrote: Hi folks,
Late last year I started working on a change to the CPython CLI (*not* the shared library) to get it to coerce the legacy C locale to something based on UTF-8 when a suitable locale is available.
After a couple of rounds of iteration on linux-sig and python-ideas, I'm now bringing it to python-dev as a concrete proposal for Python 3.7.
For most folks, reading the Abstract plus the draft docs updates in the reference implementation will tell you everything you need to know (if the C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically attempt to coerce the legacy C locale to one of those rather than persisting with the latter's default assumption of ASCII as the preferred text encoding).
I've just pushed a significant update to the PEP based on the discussions in this thread: https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f63...
The main change at the technical level is to modify the handling of the coercion target locales such that they *always* lead to "surrogateescape" being used by default on the standard streams. That means we don't need to call "Py_SetStandardStreamEncoding" during startup, that subprocesses will behave the same way as their parent processes, and that Python in Linux containers will behave consistently regardless of whether the container locale is set to "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8" by CPython.
Working on the revised implementation for this, I've ended up refactoring it so that all the heavy lifting is done by a single function exported from the shared library: "_Py_CoerceLegacyLocale()".
The CLI code then just contains the check that says "Are we running in the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all the details of how the coercion actually works being hidden away inside pylifecycle.c.
That seems like a potential opportunity to make the 3.7 version of this a public API, using the following pattern:
if (Py_LegacyLocaleDetected()) { Py_CoerceLegacyLocale(); }
That way applications embedding CPython that wanted to implement the same locale coercion logic would have an easy way to do so.
OK, the reference implementation has been updated to match the latest version of the PEP: https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5c... For now, the implementation in the standalone CLI looks like this: /* [snip] */ extern int _Py_LegacyLocaleDetected(void); extern void _Py_CoerceLegacyLocale(void); /* [snip] */ if (_Py_LegacyLocaleDetected()) { _Py_CoerceLegacyLocale(); } If we decide to make this a public API for 3.7, the necessary changes would be: - remove the leading underscore from the function names - add the function prototypes to the pylifecycle.h header - add the APIs to the C API documentation in the configuration & initialization section - define the APIs in the PEP - adjust the backport note in the PEP to say that backports should NOT expose the public C API, but keep it private Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia