This is a very bad idea.
It seems to based on an assumption that the C locale is always some kind of pathology. Admittedly, it sometimes is a result of misconfiguration or a mistake. (But I don't see why it's the interpreter's job to correct such mistakes.) However, in some cases the C locale is a normal environment for system services, cron scripts, distro package builds and whatnot.
It's possible to write Python programs that are locale-agnostic. It's also possible to write programs that are locale-dependent, but handle ASCII as locale encoding gracefully. Or you might want to write a program that intentionally aborts with an explanatory error message when the locale encoding doesn't have sufficient Unicode coverage. ("Errors should never pass silently" anyone?)
With this proposal, none of the above seems possible to correctly implement in Python.
* Nick Coghlan email@example.com, 2017-03-05, 17:50:
Another common failure case is developers specifying ``LANG=C`` in order to see otherwise translated user interface messages in English, rather than the more narrowly scoped ``LC_MESSAGES=C``.
Setting LANGUAGE=en might be better, because it doesn't affect locale encoding either, and it works even when LC_ALL is set.
Three such locales will be tried:
- ``C.UTF-8`` (available at least in Debian, Ubuntu, and Fedora 25+, and
expected to be available by default in a future version of glibc)
- ``C.utf8`` (available at least in HP-UX)
- ``UTF-8`` (available in at least some *BSD variants)
Calling the C locale "legacy" is a bit unfair, when there's even no agreement what the name of the successor is supposed to be...
NB, both "C.UTF-8" and "C.utf8" work on Fedora, thanks to glibc normalizing the encoding part. Only "C.UTF-8" works on Debian, though, for whatever reason.
For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by actually setting the ``LANG`` and ``LC_ALL`` environment variables to the candidate locale name,
Sounds wrong. This will override all LC_*, even if they were originally set to something different that C.
Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
s/set/was set/ would probably make it clearer.
Python detected LC_CTYPE=C, LC_CTYPE set to UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
The second sentence providing recommendations would be conditionally compiled based on the operating system (e.g. recommending ``LC_CTYPE=UTF-8`` on *BSD systems.
Note that at least OpenBSD supports both "C.UTF-8" and "UTF-8" locales.
While this PEP ensures that developers that need to do so can still opt-in to running their Python code in the legacy C locale,
Yeah, no, it doesn't.
It's impossible do disable coercion from Python code, because it happens to early. The best you can do is to write a wrapper script in a different language that sets PYTHONCOERCECLOCALE=0; but then you still get a spurious warning.