[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Nick Coghlan ncoghlan at gmail.com
Sat May 6 04:00:38 EDT 2017


On 5 March 2017 at 17:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Hi folks,
>
> Late last year I started working on a change to the CPython CLI (*not* the
> shared library) to get it to coerce the legacy C locale to something based
> on UTF-8 when a suitable locale is available.
>
> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
> bringing it to python-dev as a concrete proposal for Python 3.7.
>
> For most folks, reading the Abstract plus the draft docs updates in the
> reference implementation will tell you everything you need to know (if the
> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
> attempt to coerce the legacy C locale to one of those rather than persisting
> with the latter's default assumption of ASCII as the preferred text
> encoding).

I've just pushed a significant update to the PEP based on the
discussions in this thread:
https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398

The main change at the technical level is to modify the handling of
the coercion target locales such that they *always* lead to
"surrogateescape" being used by default on the standard streams. That
means we don't need to call "Py_SetStandardStreamEncoding" during
startup, that subprocesses will behave the same way as their parent
processes, and that Python in Linux containers will behave
consistently regardless of whether the container locale is set to
"C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
by CPython.

That change also eliminated the behaviour that was contingent on
whether or not PEP 540 was accepted - PEP 540 may still want to have
the coercion target locales imply full UTF-8 mode rather than just
setting the stream error handler differently, but that will be a
question to be considered when reviewing PEP 540 rather than needing
to worry about it now.

The second technical change is that the locale coercion and warning
are now enabled on Android and Mac OS X. For Android, that's a matter
of getting GNU readline to behave sensibly, while for Mac OS X, it's a
matter of simplifying the implementation and improving cross-platform
behavioural consistency (even though we don't expect the coercion to
actually have much impact there).

Beyond that, the PEP update focuses on clarifying a few other points
without actually changing the proposal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list