[Python-ideas] PEP 540: Add a new UTF-8 mode
Victor Stinner
victor.stinner at gmail.com
Thu Jan 12 10:12:07 EST 2017
2017-01-12 1:23 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:
> I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin.
The use case is to be able to write a Python 3 program which works
work UNIX pipes without failing with encoding errors:
https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes
If you want something stricter, there is the UTF-8 Strict mode which
prevent mojibake everywhere. I'm not sure that the UTF-8 Strict mode
is really useful. When I implemented it, I quickly understood that
using strict *everywhere* is just a deadend: it would fail in too many
places.
https://www.python.org/dev/peps/pep-0540/#use-the-strict-error-handler-for-operating-system-data
I'm not even sure yet that a Python 3 with stdin using strict is "usable".
> In output case, surrogateescape is weaker than strict, but it only allows
> surrgateescaped binary. If program carefully use surrogateescaped decode,
> surrogateescape on stdout is safe enough.
What do you mean that "carefully use surrogateescaped decode"?
The rationale for using surrogateescape on stdout is to support this use case:
https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout
> On the other hand, surrogateescape is very weak for input. It accepts
> arbitrary bytes.
> It should be used carefully.
In my experience with the Python bug tracker, almost nobody
understands Unicode and locales. For the "Producer-consumer model
using pipes" use case, encoding issues of Python 3.6 can be a blocker
issue. Some developers may prefer a different programming language
which doesn't bother them with Unicode: basicall, *all* other
programming languages, no?
> But I agree different encoding handler between stdin/stdout is not beautiful.
> That's why I'm ±0.
That's why there are two modes: UTF-8 and UTF-8 Strict. But I'm not
100% sure yet, on which encodings and error handlers should be used
;-) I started to play with my PEP 540 implementation. I already had to
update the PEP 540 and its implementation for Windows. On Windows,
os.fsdecode/fsencode now uses surrogatepass, not surrogateescape
(Python 3.5 uses strict on Windows).
Victor
More information about the Python-ideas
mailing list