[Python-ideas] Fix default encodings on Windows

Steve Dower steve.dower at python.org
Sun Aug 14 13:49:21 EDT 2016


I plan to use only Unicode to interact with the OS and then utf8 within Python if the caller wants bytes.

Currently we effectively use Unicode to interact with the OS and then CP_ACP if the caller wants bytes.

All the *A APIs just decode strings and call the *W APIs, and encode the return values. I'm proposing that we move the decoding and encoding into Python and make it (nearly) lossless.

In practice, this means all *A APIs are banned within the CPython source, and if we get/need bytes we have to convert to text first using the FS encoding, which will be utf8.

Top-posted from my Windows Phone

-----Original Message-----
From: "Victor Stinner" <victor.stinner at gmail.com>
Sent: ‎8/‎14/‎2016 9:20
To: "Steve Dower" <steve.dower at python.org>
Cc: "Stephen J. Turnbull" <turnbull.stephen.fw at u.tsukuba.ac.jp>; "python-ideas" <python-ideas at python.org>; "Random832" <random832 at fastmail.com>
Subject: Re: [Python-ideas] Fix default encodings on Windows

> The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes.
You get troubles when the filename comes a file, another application, a registry key, ... which is encoded to CP_ACP.
Do you plan to transcode all these data? (decode from CP_ACP, encode back to UTF-8)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160814/b932718b/attachment-0001.html>


More information about the Python-ideas mailing list