[Python-ideas] PEP 540: Add a new UTF-8 mode

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Wed Jan 11 04:36:08 EST 2017


INADA Naoki writes:

 > Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL.

You can also use LC_COLLATE=C.

 > (I wonder if we can use LC_CTYPE=UTF-8...)

Syntactically incorrect: that means the language UTF-8.
"LC_TYPE=.UTF-8" might work, but IIRC the language tag is required,
the region and encoding are optional.  Thus ja_JP, ja.UTF-8 are OK,
but .UTF-8 is not.

Rant follows:

 > But I dislike current situation that "people should learn how to
 > configure locale properly, and pitfall of non-C locale, only for
 > using UTF-8 on Python".

You can use a distro that implements and defaults to the C.utf-8
locale, and presumably you'll be OK tomorrow, well before 3.7 gets
released.  (If there are no leftover mines in the field, I don't see
a good reason to wait for 3.8 given the known deficiencies of the C
locale and the precedent of PEPs 528/529.)

Really, we're catering to users who won't set their locales properly
and insist on old distros.  For Debian, C.utf-8 was suggested in
2009[1], and that RFE refers to other distros that had already
implemented it.  I have all the sympathy in the world for them --
systems *should* Just Work -- but I'm going to lean against kludges
if they mean punishing people who actually learn about and conform to
applicable standards (and that includes well-motivated, properly-
documented, and carefully-implemented platform-specific extensions),
or use systems designed by developers who do.[2]

Footnotes: 
[1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609306

[2]  I know how bad standards can suck -- I'm a Mailman developer,
looking at you RFC 561, er, 5322.  While I'm all for nonconformism if
you take responsibility for any disasters that result, developers who
conform on behalf of their users are heroes.


More information about the Python-ideas mailing list