[Python-ideas] PEP 540: Add a new UTF-8 mode

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Mon Jan 9 13:42:38 EST 2017


INADA Naoki writes:

 > But when I see non UTF-8 text, I don't change locale to read such
 > text.

Nobody does.

The problem is if people have locales set for non-UTF-8, which Chinese
people often do ("GB18030 isn't just a good idea, it's the law").
Especially forcing stdout to something other than the locale is likely
to mess things up.

 > As my feeling, UTF-8 start dominating from about 10 years ago, and
 > ja_JP.EUC_JP (it was most common locale for Japanese before UTF-8) is
 > complete legacy.

My university's internal systems typically produce database output
(class registration lists and the like) in Shift JIS, but that's not
reliable.  Some departments still have their home pages in EUC-JP, and
pages where the meta http-equiv elements disagree with the content are
not unusual.  Private sector may be up to date, but academic sector
(and from the state of e-stat.go.jp, government in general, I suspect)
is stuck in the Jomon era.

I don't know that there's going to be a problem, but the idea of
implicitly forcing an encoding different from the locale seems
likely to cause confusion to me.  Aside from Nick's special case of
containers supplied by a vendor different from the host OS, I don't
really see why this is a good idea.  I think it's best to go with the
locale that is set (or not), unless we have very good reason to
believe that by far most users would be surprised by that, and those
who aren't surprised are mostly expert enough to know how to deal with
a forced UTF-8 environment if they *don't* want it.

A user-selected option is another matter.



More information about the Python-ideas mailing list