[Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)

Thu Jan 12 21:40:09 EST 2017

INADA Naoki writes:

 > But it's not a problem, because changing LC_CTYPE from C to C.UTF-8
 > doesn't break anything.  It's broken at start.
 > Use UTF-8 everywhere, anytime is best way to avoid mojibake.

Please stop repeating this; it is invalid as an argument.  Everybody
using Python 3 (which is the only topic for this list) already knows
that use of a common universal encoding -- in practice, UTF-8 -- is
the way forward (and Windows users also know that the Windows API is a
major exception, which proves the rule by being a different *Unicode*
transformation format).  It is not part of this discussion.

The problem is that not everybody does this yet, even today (in fact,
that's the source of the problem on containers, people are using the C
locale, not C.utf-8!), and some of us have to use or interoperate with
systems that don't, even if our own systems do.

If your position really is "Screw them, they're stupid -- let them fix
their broken systems, it's not our problem", I can understand that but
we'll have to agree to disagree.  My position is that we need to

(1) determine if this change actually can cause problems for Python
    users on such systems or interoperating with such systems
(2) determine how serious the problems are with the "force UTF-8 in
    certain situations" approach vs. the status quo
(3) compare the damage both ways,
(4) if there is a conflict, consider whether a modified proposal would
    work as well or better in more circumstances.

I think that is consistent with past Python practice on encoding
issues.