[Python-Dev] PEP 540: Add a new UTF-8 mode (v3)
Victor Stinner
victor.stinner at gmail.com
Fri Dec 8 07:58:33 EST 2017
2017-12-08 6:11 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:
> Or should we change loale.getpreferredencoding() to return UTF-8
> instead of ASCII always, regardless of PEP 538 and 540?
On the POSIX locale, if the locale coercion works (PEP 538),
locale.getpreferredencoding() returns UTF-8. We are good.
The question is for platforms like Centos 7 where the locale coercion
(PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the
locale probably uses ASCII (or maybe Latin1).
My current implementation of the PEP 540 is cheating for open(): if
sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than
calling locale.getpreferredencoding().
I checked the stdlib, and I found many places where
locale.getpreferredencoding() is used to get the user preferred
encoding:
* builtin open(): default encoding
* cgi.FieldStorage: encode the query string
* encoding._alias_mbcs(): check if the requested encoding is the ANSI code page
* gettext.GNUTranslations: lgettext() and lngettext() methods
* xml.etree.ElementTree: ElementTree.write(encoding='unicode')
In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all
use the UTF-8 encoding by default. So locale.getpreferredencoding()
should return UTF-8 if the UTF-8 mode is enabled.
The private _alias_mbcs() method can be modified to call directly
_locale._getdefaultlocale()[1] to get the ANSI code page.
Question: do we need to add an option to getpreferredencoding() to
return the locale encoding even if the UTF-8 mode is enabled. If yes,
what should be the API? locale.getpreferredencoding(utf8_mode=False)?
Victor
More information about the Python-Dev
mailing list