[Python-ideas] PEP 540: Add a new UTF-8 mode

Sun Jan 8 21:21:41 EST 2017

On Sun, Jan 8, 2017 at 1:47 AM, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> INADA Naoki writes:
>
>  > I want UTF-8 mode is enabled by default (opt-out option) even if
>  > locale is not POSIX,
>  > like `PYTHONLEGACYWINDOWSFSENCODING`.
>  >
>  > Users depends on locale know what locale is and how to configure it.
>  > They can understand difference between locale mode and UTF-8 mode
>  > and they can opt-out UTF-8 mode.
>  > But many people lives in "UTF-8 everywhere" world, and don't know
>  > about locale.
>
> I find all this very strange from someone with what looks like a
> Japanese name.  I see mojibake and non-Unicode encodings around me all
> the time.  Caveat: I teach at a University that prides itself on being
> the most international of Japanese national universities, so in my
> daily work I see Japanese in 4 different encodings (5 if you count the
> UTF-16 used internally by MS Office), Chinese in 3 different (claimed)
> encodings, and occasionally Russian in at least two encodings, ...,
> uh, I could go on but won't.  In any case, the biggest problems are
> legacy email programs and busted websites in Japanese, plus email that
> is labeled "GB2312" but actually conforms to GBK (and this is a reply
> in Japanese to a Chinese applicant writing in Japanese encoded as GBK).

Since I work on tech company, and use Linux for most only "server-side" program,
I don't live such a situation.

But when I see non UTF-8 text, I don't change locale to read such text.
(Actually speaking, locale doesn't solve mojibake because it doesn't change
my terminal emulator's encoding).
And I don't change my terminal emulator setting only for read such a text.
What I do is convert it to UTF-8 through command like `view
text-from-windows.txt ++enc=cp932`

So there are no problem when Python always use UTF-8 for fsencoding
and stdio encoding.

>
> I agree that people around me mostly know only two encodings: "works
> for me" and "mojibake", but they also use locales configured for them
> by technical staff.  On top of that, international students (the most
> likely victims of "UTF-8 by default" because students are the biggest
> Python users) typically have non-Japanese locales set on their
> imported computers.

Hmm, Which OS do they use?  There are no problem in macOS and Windows.
Do they use Linux with locale with encoding other than UTF-8, and
their terminal emulator
uses non-UTF-8 encoding?

As my feeling, UTF-8 start dominating from about 10 years ago, and
ja_JP.EUC_JP (it was most common locale for Japanese befoer UTF-8) is
complete legacy.

There is only one machine (which is in LAN, lives from 10+ years ago,
/usr/bin/python is Python 1.5!),
I can ssh which has ja_JP.eucjp locale.