[Python-ideas] PEP 540: Add a new UTF-8 mode

Stephan Houben stephanh42 at gmail.com
Fri Jan 6 09:28:58 EST 2017


Hi Victor,

2017-01-06 13:01 GMT+01:00 Victor Stinner <victor.stinner at gmail.com>:
>
> What do you mean by "eating mojibake"?

OK, I erroneously understood that the failure mode was that mojibake was
produced.

> Users complain because their
> application is stopped by a Python exception.

Got it.

> Currently, most Python 3
> applications doesn't produce or display mojibake, since Python is
> strict on outputs. (One exception: stdout with the POSIX locale since
> Python 3.5).

OK, I now tried it myself and indeed it produces the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xfe' in position
0: ordinal not in range(128)

My suggestion would be to make this error message more specific.
In particular, if we have LC_TYPE/LANG=C or unset,
we could print something like the following information
(on Linux only):

"""
You are attempting to use non-ASCII Unicode characters while your system
has been configured (possibly erroneously) to operate in the legacy "C"
locale,
which is pure ASCII.
It is strongly recommended that you configure your system to allow
arbitrary non-ASCII
Unicode characters This can be done by configuring a UTF-8 locale, for
example:

    export LANG=en_US.UTF-8

Use:
    locale -a | grep UTF-8

to get a list of all valid UTF-8 locales on your system.
"""

Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170106/58e826f1/attachment.html>


More information about the Python-ideas mailing list