[Python-ideas] Fix default encodings on Windows

Tue Aug 16 02:06:10 EDT 2016

>> On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower <steve.dower at python.org>
>> wrote:
>
> and using the *W APIs exclusively is the right way to go.

My proposal was to use the wide-character APIs, but transcoding CP_ACP
without best-fit characters and raising a warning whenever the default
character is used (e.g. substituting Katakana middle dot when creating
a file using a bytes path that has an invalid sequence in CP932). This
proposal was in response to the case made by Stephen Turnbull. If
using UTF-8 is getting such heavy pushback, I thought half a solution
was better than nothing, and it also sets up the infrastructure to
easily switch to UTF-8 if that idea eventually gains acceptance. It
could raise exceptions instead of warnings if that's preferred, since
bytes paths on Windows are already deprecated.

> *Any* encoding that may silently lose data is a problem, which basically
> leaves utf-16 as the only option. However, as that causes other problems,
> maybe we can accept the tradeoff of returning utf-8 and failing when a
> path contains invalid surrogate pairs

Are there any common sources of illegal UTF-16 surrogates in Windows
filenames? I see that WTF-8 (Wobbly) was developed to handle this
problem. A WTF-8 path would roundtrip back to the filesystem, but it
should only be used internally in a program.