[Python-ideas] PEP 540: Add a new UTF-8 mode

INADA Naoki songofacandy at gmail.com
Mon Jan 9 14:12:47 EST 2017


>
> The problem is if people have locales set for non-UTF-8, which Chinese
> people often do ("GB18030 isn't just a good idea, it's the law").
> Especially forcing stdout to something other than the locale is likely
> to mess things up.

Oh, I didn't know non-UTF-8 is used for LC_CTYPE in these years!

>
>  > As my feeling, UTF-8 start dominating from about 10 years ago, and
>  > ja_JP.EUC_JP (it was most common locale for Japanese before UTF-8) is
>  > complete legacy.
>
> My university's internal systems typically produce database output
> (class registration lists and the like) in Shift JIS, but that's not
> reliable.  Some departments still have their home pages in EUC-JP, and
> pages where the meta http-equiv elements disagree with the content are
> not unusual.  Private sector may be up to date, but academic sector
> (and from the state of e-stat.go.jp, government in general, I suspect)
> is stuck in the Jomon era.

I talked about LC_CTYPE.
We have some legacy files too.  But it's not relating to neither of
fsencoding nor stdio encoding.

>
> I don't know that there's going to be a problem, but the idea of
> implicitly forcing an encoding different from the locale seems
> likely to cause confusion to me.  Aside from Nick's special case of
> containers supplied by a vendor different from the host OS, I don't
> really see why this is a good idea.  I think it's best to go with the
> locale that is set (or not), unless we have very good reason to
> believe that by far most users would be surprised by that, and those
> who aren't surprised are mostly expert enough to know how to deal with
> a forced UTF-8 environment if they *don't* want it.
>
> A user-selected option is another matter.
>

Yes.  This is balance matter.

Some people are surprised by Python may not use UTF-8 even when writing source
code in UTF-8, unlike most of other languages. (Not only rust, Go,
node.js, but also Ruby, Perl, or even C!)

And some people are surprised because they used locale to tell terminal encoding
(which is not UTF-8) to some commands, and Python ~3.6 followed it.

I thought later group is very small, and more smaller when 3.7 is released.
And if we can drop locale support in the future, we will be able to
remove some very dirty code in
Python/fileutil.c.
That's why I prefer locale-free UTF-8 mode by default, and
locale-aware mode as opt-in.

But I'm OK we start to ignore C locale, sure.


More information about the Python-ideas mailing list