<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;">My original plan was to bypass the utf8 encoding step, but that was going to cause major issues with code that blindly assumes it can do things like sys.stdout.buffer.write(b"\n") (rather than b"\n\0" - and who'd imagine you needed to do that). I didn't want to set up secret handshakes either, at least until there's a proven performance issue.<br><br>I'd need to test to be sure, but writing an incomplete code point should just truncate to before that point. It may currently raise OSError if that truncated to zero length, as I believe that's not currently distinguished from an error. What behavior would you propose?<br><br>Reads of less than four bytes fail instantly, as in the worst case we need four bytes to represent one Unicode character. This is an unfortunate reality of trying to limit it to one system call - you'll never get a full buffer from a single read, as there is no simple mapping between length-as-utf8 and length-as-utf16 for an arbitrary string.<br><br>Top-posted from my Windows Phone</div></div><div dir="ltr"><hr><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">From: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:random832@fastmail.com">Random832</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Sent: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">‎9/‎1/‎2016 16:31</span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">To: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:python-dev@python.org">python-dev@python.org</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Subject: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8</span><br><br></div>On Thu, Sep 1, 2016, at 18:28, Steve Dower wrote:<br>> This is a raw (bytes) IO class that requires text to be passed encoded<br>> with utf-8, which will be decoded to utf-16-le and passed to the Windows APIs.<br>> Similarly, bytes read from the class will be provided by the operating <br>> system as utf-16-le and converted into utf-8 when returned to Python.<br><br>What happens if a character is broken across a buffer boundary? e.g. if<br>someone tries to read or write one byte at a time (you can't do a<br>partial read of zero bytes, there's no way to distinguish that from an<br>EOF.)<br><br>Is there going to be a higher-level text I/O class that bypasses the<br>UTF-8 encoding step when the underlying bytes stream is a console? What<br>if we did that but left the encoding as mbcs? I.e. the console is text<br>stream that can magically handle characters that aren't representable in<br>its encoding. Note that if anything does os.read/write to the console's<br>file descriptors, they're gonna get MBCS and there's nothing we can do<br>about it.<br>_______________________________________________<br>Python-Dev mailing list<br>Python-Dev@python.org<br>https://mail.python.org/mailman/listinfo/python-dev<br>Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org<br></body></html>