[Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

Victor Stinner victor.stinner at gmail.com
Fri Dec 8 10:22:35 EST 2017


I updated my PEP: in the 4th version, locale.getpreferredencoding()
now returns 'UTF-8' in the UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

I also clarified the direct effects of the UTF-8 Mode, but also listed
the most user visible changes as "Side effects".

"""
Effects of the UTF-8 Mode:

* ``sys.getfilesystemencoding()`` returns ``'UTF-8'``.
* ``locale.getpreferredencoding()`` returns ``UTF-8``, its
  *do_setlocale* argument and the locale encoding are ignored.
* ``sys.stdin`` and ``sys.stdout`` error handler is set to
  ``surrogateescape``

Side effects:

* ``open()`` uses the UTF-8 encoding by default.
* ``os.fsdecode()`` and ``os.fsencode()`` use the UTF-8 encoding.
* Command line arguments, environment variables and filenames use the
  UTF-8 encoding.
"""

Thank you Naokia INADA for your quick feedback, it was very helpful
and I really like how the PEP evolves!

IMHO the PEP 540 version 4 is just perfect and ready for
pronouncement! (... until someone finds another flaw, obviously!)

Victor


2017-12-08 13:58 GMT+01:00 Victor Stinner <victor.stinner at gmail.com>:
> 2017-12-08 6:11 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:
>> Or should we change loale.getpreferredencoding() to return UTF-8
>> instead of ASCII always, regardless of PEP 538 and 540?
>
> On the POSIX locale, if the locale coercion works (PEP 538),
> locale.getpreferredencoding() returns UTF-8. We are good.
>
> The question is for platforms like Centos 7 where the locale coercion
> (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the
> locale probably uses ASCII (or maybe Latin1).
>
> My current implementation of the PEP 540 is cheating for open(): if
> sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than
> calling locale.getpreferredencoding().
>
> I checked the stdlib, and I found many places where
> locale.getpreferredencoding() is used to get the user preferred
> encoding:
>
> * builtin open(): default encoding
> * cgi.FieldStorage: encode the query string
> * encoding._alias_mbcs(): check if the requested encoding is the ANSI code page
> * gettext.GNUTranslations: lgettext() and lngettext() methods
> * xml.etree.ElementTree: ElementTree.write(encoding='unicode')
>
> In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all
> use the UTF-8 encoding by default. So locale.getpreferredencoding()
> should return UTF-8 if the UTF-8 mode is enabled.
>
> The private _alias_mbcs() method can be modified to call directly
> _locale._getdefaultlocale()[1] to get the ANSI code page.
>
> Question: do we need to add an option to getpreferredencoding() to
> return the locale encoding even if the UTF-8 mode is enabled. If yes,
> what should be the API? locale.getpreferredencoding(utf8_mode=False)?
>
> Victor


More information about the Python-Dev mailing list