<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;">The best fix is to use a buffered reader, which will read all the available bytes and then let you .read(1), even if it happens to be an incomplete character.<br><br>We could theoretically add buffering to the raw reader to handle one character, which would allow very small reads from raw, but that severely complicates things and the advice to use a buffered reader is good advice anyway.<br><br>Top-posted from my Windows Phone</div></div><div dir="ltr"><hr><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">From: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:p.f.moore@gmail.com">Paul Moore</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Sent: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">‎9/‎5/‎2016 3:23</span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">To: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:vadmium+py@gmail.com">Martin Panter</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Cc: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:python-dev@python.org">Python Dev</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Subject: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8</span><br><br></div>On 5 September 2016 at 10:37, Martin Panter <vadmium+py@gmail.com> wrote:<br>> On 5 September 2016 at 09:10, Paul Moore <p.f.moore@gmail.com> wrote:<br>>> On 5 September 2016 at 06:54, Steve Dower <steve.dower@python.org> wrote:<br>>>> +Using the raw object with small buffers<br>>>> +---------------------------------------<br>>>> +<br>>>> +Code that uses the raw IO object and attempts to read less than four characters<br>>>> +will now receive an error. Because it's possible that any single character may<br>>>> +require up to four bytes when represented in utf-8, requests must fail.<br>>><br>>> I'm very concerned about this statement. It's clearly not true that<br>>> the request *must* fail, as reading 1 byte from a UTF-8 enabled Linux<br>>> console stream currently works (at least I believe it does). And there<br>>> is code in the wild that works by doing a test that "there's input<br>>> available" (using kbhit on Windows and select on Unix) and then doing<br>>> read(1) to ensure a non-blocking read (the pyinvoke code I referenced<br>>> earlier). If we're going to break this behaviour, I'd argue that we<br>>> need to provide a working alternative.<br>>><br>>> At a minimum, can the PEP include a recommended cross-platform means<br>>> of implementing a non-blocking read from standard input, to replace<br>>> the current approach? (If the recommendation is to read a larger<br>>> 4-byte buffer and manage the process of retaining unused bytes<br>>> yourself, then that's quite a major change to at least the code I'm<br>>> thinking of in invoke, and I'm not sure read(4) guarantees that it<br>>> *won't* block if only 1 byte is available without blocking...)<br>><br>> FWIW, on Linux and Unix in general, if select() or similar indicates<br>> that some read data is available, calling raw read() with any buffer<br>> size should return at least one byte, whatever is available, without<br>> blocking. If the user has only typed one byte, read(4) would return<br>> that one byte immediately.<br>><br>> But if you are using a BufferedReader (stdin.buffer rather than<br>> stdin.buffer.raw), then this guarantee is off and read(4) will block<br>> until it gets 4 bytes, or until EOF.<br><br>OK. So a correct non-blocking approach would be:<br><br>def ready_for_reading():<br>    if sys.platform == "win32":<br>        return msvcrt.kbhit()<br>    else:<br>        reads, _, _ = select.select([sys.stdin], [], [], 0.0)<br>        return bool(reads and reads[0] is sys.stdin)<br><br>if ready_for_reading():<br>    return sys.stdin.buffer.raw.read(4)<br><br>And using a buffer any less than 4 bytes long risks an error on input<br>(specifically, if a character than encodes to multiple UTF-8 bytes is<br>returned).<br><br>OK. That's viable, I guess, although the *actual* code in question is<br>written to be valid on Python back to 2.7, and to work for general<br>file-like objects, so it'll still be some work to get the incantation<br>correct.<br><br>It would be nice to explain this explicitly in the docs, though, as<br>read(1) is pretty common, and doesn't typically expect to get an error<br>because of this.<br><br>Thanks,<br>Paul<br>_______________________________________________<br>Python-Dev mailing list<br>Python-Dev@python.org<br>https://mail.python.org/mailman/listinfo/python-dev<br>Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org<br></body></html>