[Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

Nick Coghlan ncoghlan at gmail.com
Sat Dec 9 02:54:43 EST 2017


On 9 December 2017 at 01:22, Victor Stinner <victor.stinner at gmail.com> wrote:
> I updated my PEP: in the 4th version, locale.getpreferredencoding()
> now returns 'UTF-8' in the UTF-8 Mode.

+1, that's a good change, since it brings the "locale coercion failed"
case even closer to the "locale coercion succeeded" behaviour.

To continue with the CentOS 7 example: that actually does use a UTF-8
based locale by default, it's just en_US.UTF.8 rather than C.UTF-8.

Earlier versions of PEP 538 thus included "en_US.UTF-8" on the
candidate target locale list, but that turned out to cause assorted
problems due to the "C -> en_US" part of the coercion.

Cheers,
Nick.

P.S. Thinking back on the history of the changes though, it may be
worth revisiting the idea of "en_US.UTF-8" as a potential coercion
locale: it was dropped as a potential coercion target back when the
PEP still set both LANG & LC_ALL, whereas it now changes only
LC_CTYPE. That means setting it won't mess with LC_COLLATE, or any of
the other locale categories. That said, I'm not sure if there are
behavioural differences between "LC_CTYPE=C.UTF-8" and
"LC_CTYPE=en_US.UTF-8", so I'm inclined to leave that alone for now.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list