[Python-ideas] Fix default encodings on Windows

Random832 random832 at fastmail.com
Fri Aug 12 11:33:35 EDT 2016


On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote:
> That's the hope, though that module approaches the solution differently 
> and may still uses. An alternative way for us to fix this whole thing 
> would be to bring win_unicode_console into the standard library and use 
> it by default (or probably whenever PYTHONIOENCODING is not specified).

I have concerns about win_unicode_console:
- For the "text_transcoded" streams, stdout.encoding is utf-8. For the
"text" streams, it is utf-16.
- There is no object, as far as I can find, which can be used as an
unbuffered unicode I/O object.
- raw output streams silently drop the last byte if an odd number of
bytes are written.
- The sys.stdout obtained via streams.enable does not support .buffer /
.buffer.raw / .detach
- All of these objects provide a fileno() interface.
- When using os.read/write for data that represents text, the data still
should be encoded in the console encoding and not in utf-8 or utf-16.

How is it important to preserve the validity of the conventional advice
for "putting stdin/stdout in binary mode" using .buffer or .detach? I
suspect this is mainly used for programs intended to have their output
redirected, but today it 'kind of works' to run such a program on the
console and inspect its output. How important is it for
os.read/write(stdxxx.fileno()) to be consistent with stdxxx.encoding?

Should errors='surrogatepass' be used? It's unlikely, but not
impossible, to paste an invalid surrogate into the console. With
win_unicode_console, this results in a UnicodeDecodeError and, if this
happened during a readline, disables the readline hook.

Is it possible to break this by typing a valid surrogate pair that falls
across a buffer boundary?


More information about the Python-ideas mailing list