[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Nick Coghlan
ncoghlan at gmail.com
Sat May 6 10:24:52 EDT 2017
On 6 May 2017 at 18:33, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 6 May 2017 at 18:00, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 5 March 2017 at 17:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> Hi folks,
>>>
>>> Late last year I started working on a change to the CPython CLI (*not* the
>>> shared library) to get it to coerce the legacy C locale to something based
>>> on UTF-8 when a suitable locale is available.
>>>
>>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
>>> bringing it to python-dev as a concrete proposal for Python 3.7.
>>>
>>> For most folks, reading the Abstract plus the draft docs updates in the
>>> reference implementation will tell you everything you need to know (if the
>>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
>>> attempt to coerce the legacy C locale to one of those rather than persisting
>>> with the latter's default assumption of ASCII as the preferred text
>>> encoding).
>>
>> I've just pushed a significant update to the PEP based on the
>> discussions in this thread:
>> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398
>>
>> The main change at the technical level is to modify the handling of
>> the coercion target locales such that they *always* lead to
>> "surrogateescape" being used by default on the standard streams. That
>> means we don't need to call "Py_SetStandardStreamEncoding" during
>> startup, that subprocesses will behave the same way as their parent
>> processes, and that Python in Linux containers will behave
>> consistently regardless of whether the container locale is set to
>> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
>> by CPython.
>
> Working on the revised implementation for this, I've ended up
> refactoring it so that all the heavy lifting is done by a single
> function exported from the shared library: "_Py_CoerceLegacyLocale()".
>
> The CLI code then just contains the check that says "Are we running in
> the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all
> the details of how the coercion actually works being hidden away
> inside pylifecycle.c.
>
> That seems like a potential opportunity to make the 3.7 version of
> this a public API, using the following pattern:
>
> if (Py_LegacyLocaleDetected()) {
> Py_CoerceLegacyLocale();
> }
>
> That way applications embedding CPython that wanted to implement the
> same locale coercion logic would have an easy way to do so.
OK, the reference implementation has been updated to match the latest
version of the PEP:
https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5
For now, the implementation in the standalone CLI looks like this:
/* [snip] */
extern int _Py_LegacyLocaleDetected(void);
extern void _Py_CoerceLegacyLocale(void);
/* [snip] */
if (_Py_LegacyLocaleDetected()) {
_Py_CoerceLegacyLocale();
}
If we decide to make this a public API for 3.7, the necessary changes would be:
- remove the leading underscore from the function names
- add the function prototypes to the pylifecycle.h header
- add the APIs to the C API documentation in the configuration &
initialization section
- define the APIs in the PEP
- adjust the backport note in the PEP to say that backports should NOT
expose the public C API, but keep it private
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list