[Python-ideas] PEP 540: Add a new UTF-8 mode

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Sat Jan 7 11:47:21 EST 2017


INADA Naoki writes:

 > I want UTF-8 mode is enabled by default (opt-out option) even if
 > locale is not POSIX,
 > like `PYTHONLEGACYWINDOWSFSENCODING`.
 > 
 > Users depends on locale know what locale is and how to configure it.
 > They can understand difference between locale mode and UTF-8 mode
 > and they can opt-out UTF-8 mode.
 > But many people lives in "UTF-8 everywhere" world, and don't know
 > about locale.

I find all this very strange from someone with what looks like a
Japanese name.  I see mojibake and non-Unicode encodings around me all
the time.  Caveat: I teach at a University that prides itself on being
the most international of Japanese national universities, so in my
daily work I see Japanese in 4 different encodings (5 if you count the
UTF-16 used internally by MS Office), Chinese in 3 different (claimed)
encodings, and occasionally Russian in at least two encodings, ...,
uh, I could go on but won't.  In any case, the biggest problems are
legacy email programs and busted websites in Japanese, plus email that
is labeled "GB2312" but actually conforms to GBK (and this is a reply
in Japanese to a Chinese applicant writing in Japanese encoded as GBK).

I agree that people around me mostly know only two encodings: "works
for me" and "mojibake", but they also use locales configured for them
by technical staff.  On top of that, international students (the most
likely victims of "UTF-8 by default" because students are the biggest
Python users) typically have non-Japanese locales set on their
imported computers.

I'm not going to say my experience is typical enough to block "UTF-8
by default", but let's do this very carefully with thought.



More information about the Python-ideas mailing list