[Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Steve Dower steve.dower at python.org
Mon Sep 5 13:38:01 EDT 2016


On 05Sep2016 0941, Paul Moore wrote:
> On 5 September 2016 at 14:36, Steve Dower <steve.dower at python.org> wrote:
>> The best fix is to use a buffered reader, which will read all the available
>> bytes and then let you .read(1), even if it happens to be an incomplete
>> character.
>
> But this is sys.stdin.buffer.raw, we're talking about. People can't
> really layer anything on top of that, it's precisely because they are
> trying to *bypass* the existing layering (that doesn't work the way
> that they need it to, because it blocks) that is the problem here.

This layer also blocks, and always has. You need to go to platform 
specific functions anyway to get non-blocking functionality (which is 
also wrapped up in getc I believe, but that isn't used by FileIO or the 
new WinConsoleIO classes).

>> We could theoretically add buffering to the raw reader to handle one character,
>> which would allow very small reads from raw, but that severely complicates
>> things and the advice to use a buffered reader is good advice anyway.
>
> Can you provide an example of how I'd rewrite the code that I quoted
> previously to follow this advice? Note - this is not theoretical, I
> expect to have to provide a PR to fix exactly this code should this
> change go in. At the moment I can't find a way that doesn't impact the
> (currently working and not expected to need any change) Unix version
> of the code, most likely I'll have to add buffering of 4-byte reads
> (which as you say is complex).

The easiest way to follow it is to use "sys.stdin.buffer.read(1)" rather 
than "sys.stdin.buffer.raw.read(1)".

> PS I'm not 100% sure that under POSIX read() will return partial UTF-8
> byte sequences. I think it must, because otherwise a lot of code I've
> seen would be broken, but if a POSIX expert can confirm or deny my
> assumption, that would be great.

I just tested, and yes it returns partial characters. That's a good 
reason to do the single character buffering ourselves. Shouldn't be too 
hard to deal with.

Cheers,
Steve



More information about the Python-Dev mailing list