[Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Steve Dower steve.dower at python.org
Mon Sep 5 09:36:23 EDT 2016


The best fix is to use a buffered reader, which will read all the available bytes and then let you .read(1), even if it happens to be an incomplete character.

We could theoretically add buffering to the raw reader to handle one character, which would allow very small reads from raw, but that severely complicates things and the advice to use a buffered reader is good advice anyway.

Top-posted from my Windows Phone

-----Original Message-----
From: "Paul Moore" <p.f.moore at gmail.com>
Sent: ‎9/‎5/‎2016 3:23
To: "Martin Panter" <vadmium+py at gmail.com>
Cc: "Python Dev" <python-dev at python.org>
Subject: Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

On 5 September 2016 at 10:37, Martin Panter <vadmium+py at gmail.com> wrote:
> On 5 September 2016 at 09:10, Paul Moore <p.f.moore at gmail.com> wrote:
>> On 5 September 2016 at 06:54, Steve Dower <steve.dower at python.org> wrote:
>>> +Using the raw object with small buffers
>>> +---------------------------------------
>>> +
>>> +Code that uses the raw IO object and attempts to read less than four characters
>>> +will now receive an error. Because it's possible that any single character may
>>> +require up to four bytes when represented in utf-8, requests must fail.
>>
>> I'm very concerned about this statement. It's clearly not true that
>> the request *must* fail, as reading 1 byte from a UTF-8 enabled Linux
>> console stream currently works (at least I believe it does). And there
>> is code in the wild that works by doing a test that "there's input
>> available" (using kbhit on Windows and select on Unix) and then doing
>> read(1) to ensure a non-blocking read (the pyinvoke code I referenced
>> earlier). If we're going to break this behaviour, I'd argue that we
>> need to provide a working alternative.
>>
>> At a minimum, can the PEP include a recommended cross-platform means
>> of implementing a non-blocking read from standard input, to replace
>> the current approach? (If the recommendation is to read a larger
>> 4-byte buffer and manage the process of retaining unused bytes
>> yourself, then that's quite a major change to at least the code I'm
>> thinking of in invoke, and I'm not sure read(4) guarantees that it
>> *won't* block if only 1 byte is available without blocking...)
>
> FWIW, on Linux and Unix in general, if select() or similar indicates
> that some read data is available, calling raw read() with any buffer
> size should return at least one byte, whatever is available, without
> blocking. If the user has only typed one byte, read(4) would return
> that one byte immediately.
>
> But if you are using a BufferedReader (stdin.buffer rather than
> stdin.buffer.raw), then this guarantee is off and read(4) will block
> until it gets 4 bytes, or until EOF.

OK. So a correct non-blocking approach would be:

def ready_for_reading():
    if sys.platform == "win32":
        return msvcrt.kbhit()
    else:
        reads, _, _ = select.select([sys.stdin], [], [], 0.0)
        return bool(reads and reads[0] is sys.stdin)

if ready_for_reading():
    return sys.stdin.buffer.raw.read(4)

And using a buffer any less than 4 bytes long risks an error on input
(specifically, if a character than encodes to multiple UTF-8 bytes is
returned).

OK. That's viable, I guess, although the *actual* code in question is
written to be valid on Python back to 2.7, and to work for general
file-like objects, so it'll still be some work to get the incantation
correct.

It would be nice to explain this explicitly in the docs, though, as
read(1) is pretty common, and doesn't typically expect to get an error
because of this.

Thanks,
Paul
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160905/02ad03b7/attachment.html>


More information about the Python-Dev mailing list