On Mon, Mar 13, 2017 at 8:01 PM, Nick Coghlan email@example.com wrote:
On 13 March 2017 at 18:37, INADA Naoki firstname.lastname@example.org wrote:
But locale coercing works nice on platforms like android. So how about simplified version of PEP 538? Just adding configure option for locale coercing which is disabled by default. No envvar options and no warnings.
That doesn't solve my original Linux distro problem, where locale misconfiguration problems show up as "Python 2 works, Python 3 doesn't work" behaviour and bug reports.
Sorry, I meant "PEP 540 + Simplified PEP 538 (coercing by configure option)". distros can enable the configure option, off course.
The problem is that where Python 2 was largely locale-independent by default (just passing raw bytes through) such that you'd only get immediate encoding or decoding errors if you had a Unicode literal or a decode() call somewhere in your code and would otherwise pass data corruption problems further down the chain, Python 3 is locale-*aware* by default, and eagerly decodes:
- command line parameters
- environment variables
- responses from operating system API calls
- standard stream input
- file contents
You *can* still write locale-independent Python 3 applications, but they involve sprinkling liberal doses of "b" prefixes and suffixes and mode settings and "surrogateescape" error handler declarations in various places
- you can't just run python-modernize over a pre-existing Python 2
application and expect it to behave the same way in the C locale as it did before.
Once implemented, PEP 540 will partially solve the problem by introducing a locale independent UTF-8 mode, but that still leaves the inconsistency with other locale-aware components that are needing to deal with Python 3 API calls that accept or return Unicode objects where Python 2 allowed the use of 8-bit strings.
I feel problems PEP 538 solves, but PEP 540 doesn't solve are relatively small compared with complexity introduced PEP 538. As my understanding, PEP 538 solves problems only when:
* python executable is used. (GUI applications linking Python for plugin is not affected) * One of C.UTF-8, C.utf8 or UTF8 is accepted for LC_CTYPE. * The "locale aware components" uses something other than ASCII or UTF-8 on C locale, but uses UTF-8 on UTF-8 locale.
Can't we reduce options from 3 (2 configure, 1 envvar) when PEP 540 is accepted too?
Folks that really want the old behaviour back will be able to set PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build their own CPython from source using `--without-c-locale-coercion` and ``--without-c-locale-warning`. However, they'll also get the explicit support notification from PEP 11 that any Unicode handling bugs they run into in those configurations are entirely their own problem - we won't fix them, because we consider those configurations unsupportable in the general case.
That puts the additional self-support burden on folks doing something unusual (i.e. insisting on running an ASCII-only environment in 2017), rather than on those with a more conventional use case (i.e. running an up to date *nix OS using UTF-8 or another universal encoding for both local and remote interfaces).
-- Nick Coghlan | email@example.com | Brisbane, Australia