[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Toshio Kuratomi a.badger at gmail.com
Thu May 4 21:24:13 EDT 2017


On Sat, Mar 4, 2017 at 11:50 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> Providing implicit locale coercion only when running standalone
> ---------------------------------------------------------------
>
> Over the course of Python 3.x development, multiple attempts have been made
> to improve the handling of incorrect locale settings at the point where the
> Python interpreter is initialised. The problem that emerged is that this is
> ultimately *too late* in the interpreter startup process - data such as
> command
> line arguments and the contents of environment variables may have already
> been
> retrieved from the operating system and processed under the incorrect ASCII
> text encoding assumption well before ``Py_Initialize`` is called.
>
> The problems created by those inconsistencies were then even harder to
> diagnose
> and debug than those created by believing the operating system's claim that
> ASCII was a suitable encoding to use for operating system interfaces. This
> was
> the case even for the default CPython binary, let alone larger C/C++
> applications that embed CPython as a scripting engine.
>
> The approach proposed in this PEP handles that problem by moving the locale
> coercion as early as possible in the interpreter startup sequence when
> running
> standalone: it takes place directly in the C-level ``main()`` function, even
> before calling in to the `Py_Main()`` library function that implements the
> features of the CPython interpreter CLI.
>
> The ``Py_Initialize`` API then only gains an explicit warning (emitted on
> ``stderr``) when it detects use of the ``C`` locale, and relies on the
> embedding application to specify something more reasonable.
>

It feels like having a short section on the caveats of this approach
would help to introduce this section.  Something that says that this
PEP can cause a split in how Python behaves in non-sandalone
applications (mod_wsgi, IDEs where libpython is compiled in, etc) vs
standalone (unless the embedders take similar steps as standalone
python is doing).  Then go on to state that this approach was still
chosen as coercing in Py_Initialize is too late, causing the
inconsistencies and problems listed here.

-Toshio


More information about the Python-Dev mailing list