[Python-Dev] PEP 538 (review round 2): Coercing the legacy C locale to a UTF-8 based locale

Nick Coghlan ncoghlan at gmail.com
Sun Jun 11 21:48:38 EDT 2017


On 12 June 2017 at 10:05, Martin (gzlist) via Python-Dev
<python-dev at python.org> wrote:
> I don't like the side effect of changing the standard stream error
> handler to surrogateescape if LANG=C.UTF-8 is actually set. Obviously
> bad data vs exception is a trade off anyway, but means to get a Python
> script that will always output valid data or exit, you have to set an
> arbitrary language like en_US. Yes, that's true of the change as
> implemented in 3.5 anyway.

`PYTHONIOENCODING=:strict` remains the preferred way of forcing strict
encoding checks on the standard streams, regardless of locale.

> I'm not sold on adding the PYTHONCOERCECLOCALE runtime configuration.
> All it really does is turn off stderr kipple if you must use the C
> locale for other reasons? Anyone with the ability to set that variable
> could just set LANG instead. I was going to suggest just documenting
> LC_ALL=C as the override instead of adding a python specific variable,
> but note looking around that Debian discourage that[3].

In addition to providing a reliable escape hatch with no other
potentially unwanted side effects (for when folks actually want the
current behaviour), the entry for the off switch in the CLI usage docs
also provides us with a convenient place to document the *default*
behaviour.

> That's all, though I will also grumble a bit about how long the PEP is.

The ASCII-to-Unicode migration has been in progress for almost as long
as Python has been around, and ASCII has been the default encoding in
C for almost twice as long as that, so it takes a bit of text to
explain why *now* is a good time to break with 50+ years of precedent
:)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list