I plan to use only Unicode to interact with the OS and then utf8 within Python if the caller wants bytes.
Currently we effectively use Unicode to interact with the OS and then CP_ACP if the caller wants bytes.
All the *A APIs just decode strings and call the *W APIs, and encode the return values. I'm proposing that we move the decoding and encoding into Python and make it (nearly) lossless.
In practice, this means all *A APIs are banned within the CPython source, and if we get/need bytes we have to convert to text first using the FS encoding, which will be utf8.
Top-posted from my Windows Phone
-----Original Message----- From: "Victor Stinner" email@example.com Sent: 8/14/2016 9:20 To: "Steve Dower" firstname.lastname@example.org Cc: "Stephen J. Turnbull" email@example.com; "python-ideas" firstname.lastname@example.org; "Random832" email@example.com Subject: Re: [Python-ideas] Fix default encodings on Windows
The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes.
You get troubles when the filename comes a file, another application, a registry key, ... which is encoded to CP_ACP. Do you plan to transcode all these data? (decode from CP_ACP, encode back to UTF-8)