[Python-ideas] PEP 540: Add a new UTF-8 mode

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Thu Jan 5 21:10:45 EST 2017


Victor Stinner writes:

 > Python 3.6 is not exactly in the first or the later category: "it
 > depends".
 > 
 > To read data from the operating system, Python 3.6 behaves in "UNIX
 > mode": os.listdir() *does* return invalid filenames, it uses a funny
 > encoding using surrogates.
 > 
 > To write data back to the operating system, Python 3.6 wears its
 > "Unicode nazi" hat and becomes strict. It's no more possible to write
 > data from from the operating system back to the operating system.
 > Writing a filename read from os.listdir() into stdout or into a text
 > file fails with an encode error.
 > 
 > Subtle behaviour: since Python 3.6, with the POSIX locale, Python 3.6
 > uses the "UNIX mode" but only to write into stdout. It's possible to
 > write a filename into stdout, but not into a text file.

The point of this, I suppose, is that piping to xargs works by
default.

I haven't read the PEPs (don't have time, mea culpa), but my ideal
would be three options:

    --transparent ->  errors=surrogateescape on input and output
    --postel ->  errors=surrogateescape on input, =strict on output
    --unicode-me-harder ->  errors=strict on input and output

with --postel being default.  Unix afficianados with lots of xargs use
can use --transparent.  Since people have different preferences, I
guess there should be an envvar for this.

Others probably should configure open() by open().  I'll try to get to
the PEPs over the weekend but can't promise.

Steve


More information about the Python-ideas mailing list