<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;">I plan to use only Unicode to interact with the OS and then utf8 within Python if the caller wants bytes.<br><br>Currently we effectively use Unicode to interact with the OS and then CP_ACP if the caller wants bytes.<br><br>All the *A APIs just decode strings and call the *W APIs, and encode the return values. I'm proposing that we move the decoding and encoding into Python and make it (nearly) lossless.<br><br>In practice, this means all *A APIs are banned within the CPython source, and if we get/need bytes we have to convert to text first using the FS encoding, which will be utf8.<br><br>Top-posted from my Windows Phone</div></div><div dir="ltr"><hr><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">From: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:victor.stinner@gmail.com">Victor Stinner</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Sent: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">8/14/2016 9:20</span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">To: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:steve.dower@python.org">Steve Dower</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Cc: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:turnbull.stephen.fw@u.tsukuba.ac.jp">Stephen J. Turnbull</a>; <a href="mailto:python-ideas@python.org">python-ideas</a>; <a href="mailto:random832@fastmail.com">Random832</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Subject: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">Re: [Python-ideas] Fix default encodings on Windows</span><br><br></div><p dir="ltr">> The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes.</p>
<p dir="ltr">You get troubles when the filename comes a file, another application, a registry key, ... which is encoded to CP_ACP.</p>
<p dir="ltr">Do you plan to transcode all these data? (decode from CP_ACP, encode back to UTF-8)</p>
</body></html>