On 6 May 2017 at 18:00, Nick Coghlan ncoghlan@gmail.com wrote:
On 5 March 2017 at 17:50, Nick Coghlan ncoghlan@gmail.com wrote:
Hi folks,
Late last year I started working on a change to the CPython CLI (*not* the shared library) to get it to coerce the legacy C locale to something based on UTF-8 when a suitable locale is available.
After a couple of rounds of iteration on linux-sig and python-ideas, I'm now bringing it to python-dev as a concrete proposal for Python 3.7.
For most folks, reading the Abstract plus the draft docs updates in the reference implementation will tell you everything you need to know (if the C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically attempt to coerce the legacy C locale to one of those rather than persisting with the latter's default assumption of ASCII as the preferred text encoding).
I've just pushed a significant update to the PEP based on the discussions in this thread: https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f63...
The main change at the technical level is to modify the handling of the coercion target locales such that they *always* lead to "surrogateescape" being used by default on the standard streams. That means we don't need to call "Py_SetStandardStreamEncoding" during startup, that subprocesses will behave the same way as their parent processes, and that Python in Linux containers will behave consistently regardless of whether the container locale is set to "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8" by CPython.
Working on the revised implementation for this, I've ended up refactoring it so that all the heavy lifting is done by a single function exported from the shared library: "_Py_CoerceLegacyLocale()".
The CLI code then just contains the check that says "Are we running in the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all the details of how the coercion actually works being hidden away inside pylifecycle.c.
That seems like a potential opportunity to make the 3.7 version of this a public API, using the following pattern:
if (Py_LegacyLocaleDetected()) { Py_CoerceLegacyLocale(); }
That way applications embedding CPython that wanted to implement the same locale coercion logic would have an easy way to do so.
Thoughts?
Cheers, Nick.