
On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote:
That's the hope, though that module approaches the solution differently and may still uses. An alternative way for us to fix this whole thing would be to bring win_unicode_console into the standard library and use it by default (or probably whenever PYTHONIOENCODING is not specified).
I have concerns about win_unicode_console: - For the "text_transcoded" streams, stdout.encoding is utf-8. For the "text" streams, it is utf-16. - There is no object, as far as I can find, which can be used as an unbuffered unicode I/O object. - raw output streams silently drop the last byte if an odd number of bytes are written. - The sys.stdout obtained via streams.enable does not support .buffer / .buffer.raw / .detach - All of these objects provide a fileno() interface. - When using os.read/write for data that represents text, the data still should be encoded in the console encoding and not in utf-8 or utf-16.
How is it important to preserve the validity of the conventional advice for "putting stdin/stdout in binary mode" using .buffer or .detach? I suspect this is mainly used for programs intended to have their output redirected, but today it 'kind of works' to run such a program on the console and inspect its output. How important is it for os.read/write(stdxxx.fileno()) to be consistent with stdxxx.encoding?
Should errors='surrogatepass' be used? It's unlikely, but not impossible, to paste an invalid surrogate into the console. With win_unicode_console, this results in a UnicodeDecodeError and, if this happened during a readline, disables the readline hook.
Is it possible to break this by typing a valid surrogate pair that falls across a buffer boundary?