[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Nick Coghlan ncoghlan at gmail.com
Sat May 6 04:33:14 EDT 2017


On 6 May 2017 at 18:00, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 5 March 2017 at 17:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Hi folks,
>>
>> Late last year I started working on a change to the CPython CLI (*not* the
>> shared library) to get it to coerce the legacy C locale to something based
>> on UTF-8 when a suitable locale is available.
>>
>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
>> bringing it to python-dev as a concrete proposal for Python 3.7.
>>
>> For most folks, reading the Abstract plus the draft docs updates in the
>> reference implementation will tell you everything you need to know (if the
>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
>> attempt to coerce the legacy C locale to one of those rather than persisting
>> with the latter's default assumption of ASCII as the preferred text
>> encoding).
>
> I've just pushed a significant update to the PEP based on the
> discussions in this thread:
> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398
>
> The main change at the technical level is to modify the handling of
> the coercion target locales such that they *always* lead to
> "surrogateescape" being used by default on the standard streams. That
> means we don't need to call "Py_SetStandardStreamEncoding" during
> startup, that subprocesses will behave the same way as their parent
> processes, and that Python in Linux containers will behave
> consistently regardless of whether the container locale is set to
> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
> by CPython.

Working on the revised implementation for this, I've ended up
refactoring it so that all the heavy lifting is done by a single
function exported from the shared library: "_Py_CoerceLegacyLocale()".

The CLI code then just contains the check that says "Are we running in
the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all
the details of how the coercion actually works being hidden away
inside pylifecycle.c.

That seems like a potential opportunity to make the 3.7 version of
this a public API, using the following pattern:

    if (Py_LegacyLocaleDetected()) {
        Py_CoerceLegacyLocale();
    }

That way applications embedding CPython that wanted to implement the
same locale coercion logic would have an easy way to do so.

Thoughts?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list