On Sat, Mar 4, 2017 at 11:50 PM, Nick Coghlan ncoghlan@gmail.com wrote:
Providing implicit locale coercion only when running standalone
Over the course of Python 3.x development, multiple attempts have been made to improve the handling of incorrect locale settings at the point where the Python interpreter is initialised. The problem that emerged is that this is ultimately *too late* in the interpreter startup process - data such as command line arguments and the contents of environment variables may have already been retrieved from the operating system and processed under the incorrect ASCII text encoding assumption well before ``Py_Initialize`` is called.
The problems created by those inconsistencies were then even harder to diagnose and debug than those created by believing the operating system's claim that ASCII was a suitable encoding to use for operating system interfaces. This was the case even for the default CPython binary, let alone larger C/C++ applications that embed CPython as a scripting engine.
The approach proposed in this PEP handles that problem by moving the locale coercion as early as possible in the interpreter startup sequence when running standalone: it takes place directly in the C-level ``main()`` function, even before calling in to the `Py_Main()`` library function that implements the features of the CPython interpreter CLI.
The ``Py_Initialize`` API then only gains an explicit warning (emitted on ``stderr``) when it detects use of the ``C`` locale, and relies on the embedding application to specify something more reasonable.
It feels like having a short section on the caveats of this approach would help to introduce this section. Something that says that this PEP can cause a split in how Python behaves in non-sandalone applications (mod_wsgi, IDEs where libpython is compiled in, etc) vs standalone (unless the embedders take similar steps as standalone python is doing). Then go on to state that this approach was still chosen as coercing in Py_Initialize is too late, causing the inconsistencies and problems listed here.
-Toshio