[Python-ideas] Fix default encodings on Windows

Nick Coghlan ncoghlan at gmail.com
Mon Aug 15 22:00:00 EDT 2016


On 16 August 2016 at 11:34, Chris Barker - NOAA Federal
<chris.barker at noaa.gov> wrote:
>> Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like:
>>
>> if os.name == 'nt':
>>    f = os.stat(os.listdir('.')[-1])
>> else:
>>    f = os.stat(os.listdir(b'.')[-1])
>
> REALLY? Do we really want to encourage using bytes as paths? IIUC,
> anyone that wants to platform-independentify that code just needs to
> use proper strings (or pat glib) for paths everywhere, yes?

The problem is that bytes-as-paths actually *does* work for Mac OS X
and systemd based Linux distros properly configured to use UTF-8 for
OS interactions. This means that a lot of backend network service code
makes that assumption, especially when it was originally written for
Python 2, and rather than making it work properly on Windows, folks
just drop Windows support as part of migrating to Python 3.

At an ecosystem level, that means we're faced with a choice between
implicitly encouraging folks to make their code *nix only, and finding
a way to provide a more *nix like experience when running on Windows
(where UTF-8 encoded binary data just works, and either other
encodings lead to mojibake or else you use chardet to figure things
out).

Steve is suggesting that the latter option is preferable, a view I
agree with since it lowers barriers to entry for Windows based
developers to contribute to primarily *nix focused projects.

> I understand that pre-surrogate-escape, there was a need for bytes
> paths, but those days are gone, yes?

No, UTF-8 encoded bytes are still the native language of network
service development: http://utf8everywhere.org/

It also helps with cases where folks are switching back and forth
between Python and other environments like JavaScript and Go where the
UTF-8 assumption is more prevalent.

> So why, at this late date, kludge what should be a deprecated pattern
> into the Windows build???

Promoting cross-platform consistency often leads to enabling patterns
that are considered a bad idea from a native platform perspective, and
this strikes me as an example of that (just as the binary/text
separation itself is a case where Python 3 diverged from the POSIX
text model to improve consistency across *nix, Windows, JVM and CLR
environments).

Cheers,
Nick.


More information about the Python-ideas mailing list