On 13 March 2017 at 18:37, INADA Naoki <songofacandy@gmail.com> wrote:
But locale coercing works nice on platforms like android.
So how about simplified version of PEP 538?  Just adding configure
option for locale coercing
which is disabled by default.  No envvar options and no warnings.

That doesn't solve my original Linux distro problem, where locale misconfiguration problems show up as "Python 2 works, Python 3 doesn't work" behaviour and bug reports.

The problem is that where Python 2 was largely locale-independent by default (just passing raw bytes through) such that you'd only get immediate encoding or decoding errors if you had a Unicode literal or a decode() call somewhere in your code and would otherwise pass data corruption problems further down the chain, Python 3 is locale-*aware* by default, and eagerly decodes:

- command line parameters
- environment variables
- responses from operating system API calls
- standard stream input
- file contents

You *can* still write locale-independent Python 3 applications, but they involve sprinkling liberal doses of "b" prefixes and suffixes and mode settings and "surrogateescape" error handler declarations in various places - you can't just run python-modernize over a pre-existing Python 2 application and expect it to behave the same way in the C locale as it did before.

Once implemented, PEP 540 will partially solve the problem by introducing a locale independent UTF-8 mode, but that still leaves the inconsistency with other locale-aware components that are needing to deal with Python 3 API calls that accept or return Unicode objects where Python 2 allowed the use of 8-bit strings.

Folks that really want the old behaviour back will be able to set PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build their own CPython from source using `--without-c-locale-coercion` and ``--without-c-locale-warning`. However, they'll also get the explicit support notification from PEP 11 that any Unicode handling bugs they run into in those configurations are entirely their own problem - we won't fix them, because we consider those configurations unsupportable in the general case.

That puts the additional self-support burden on folks doing something unusual (i.e. insisting on running an ASCII-only environment in 2017), rather than on those with a more conventional use case (i.e. running an up to date \*nix OS using UTF-8 or another universal encoding for both local and remote interfaces).


Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia