[Python-ideas] PEP 540: Add a new UTF-8 mode

INADA Naoki songofacandy at gmail.com
Fri Jan 6 02:21:21 EST 2017


LGTM.

Some comments:

I want UTF-8 mode is enabled by default (opt-out option) even if
locale is not POSIX,
like `PYTHONLEGACYWINDOWSFSENCODING`.

Users depends on locale know what locale is and how to configure it.
They can understand difference between locale mode and UTF-8 mode
and they can opt-out UTF-8 mode.
But many people lives in "UTF-8 everywhere" world, and don't know about locale.


`-X utf8` option should be parsed before converting commandline
arguments to wchar_t*.
How about adding Py_UnixMain(int argc, char** argv) which is available
only on Unix?

I dislike wchar_t type and mbstowcs functions on Unix. (I love wchar_t
on Windows, off course).
I hope we can remove `wchar_t *wstr` from PyASCIIObject and deprecate
all wchar_t APIs
on Unix in the future.


On Fri, Jan 6, 2017 at 10:43 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Ok, I modified my PEP: the POSIX locale now enables the UTF-8 mode.
>
> 2017-01-05 18:10 GMT+01:00 Victor Stinner <victor.stinner at gmail.com>:
>> A common request is that "Python just works" without having to pass a
>> command line option or set an environment variable. Maybe the default
>> behaviour should be left unchanged, but the behaviour with the POSIX
>> locale should change.
>
> http://bugs.python.org/issue28180 asks to "change the default" to get
> a Python which "just works" without any kind of configuration, in the
> context of a Docker image (I don't any detail about the image yet).
>
>
>> Maybe we can enable the UTF-8 mode (or "UNIX mode") of the PEP 540
>> when the POSIX locale is used?
>
> I read again other issues and I confirm that users are looking for a
> Python 3 which behaves like Python 2: simply don't bother them with
> encodings. I see the UTF-8 mode as an opportunity to answer to this
> request.
>
> Moreover, the most common cause of encoding issues is a program run
> with no locale variable set and so using the POSIX locale.
>
> So I modified my PEP 540: the POSIX locale now enables the UTF-8 mode.
> I had to update the "Backward Compatibility" section since the PEP now
> introduces a backward incompatible change (POSIX locale), but my bet
> is that the new behaviour is the one expected by users and that it
> cannot break applications.
>
> I moved my initial proposition as an alternative.
>
> I added a "Use Cases" section to explain in depth the "always work"
> behaviour, which I called the "UNIX mode" in my previous email.
>
> Latest version of the PEP:
> https://github.com/python/peps/blob/master/pep-0540.txt
>
> https://www.python.org/dev/peps/pep-0540/ will be updated shortly.
>
> Victor
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list