[Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Steve Dower steve.dower at python.org
Mon Sep 5 15:54:06 EDT 2016


On 05Sep2016 1234, eryk sun wrote:
> Also, the console is UCS-2, which can't be transcoded between UTF-16
> and UTF-8. Supporting UCS-2 in the console would integrate nicely with
> the filesystem PEP. It makes it always possible to print
> os.listdir('.'), copy and paste, and read it back without data loss.

Supporting UTF-8 actually works better for this. We already use 
surrogatepass explicitly (on the filesystem side, with PEP 529) and 
implicitly (on the console side, using the Windows conversion API).

> It would probably be simpler to use UTF-16 in the main pipeline and
> implement Martin's suggestion to mix in a UTF-8 buffer. The UTF-16
> buffer could be renamed as "wbuffer", for expert use. However, if
> you're fully committed to transcoding in the raw layer, I'm certain
> that these problems can be addressed with small buffers and using
> Python's codec machinery for a flexible mix of "surrogatepass" and
> "replace" error handling.

I don't think it actually makes things simpler. Having two buffers is 
generally a bad idea unless they are perfectly synced, which would be 
impossible here without data corruption (if you read half a utf-8 
character sequence and then read the wide buffer, do you get that 
character or not?).

Writing a partial character is easily avoidable by the user. We can 
either fail with an error or print garbage, and currently printing 
garbage is the most compatible behaviour. (Also occurs on Linux - I have 
a VM running this week for testing this stuff.)

Cheers,
Steve


More information about the Python-Dev mailing list