On 16 August 2016 at 11:34, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like:
if os.name == 'nt': f = os.stat(os.listdir('.')[-1]) else: f = os.stat(os.listdir(b'.')[-1])
REALLY? Do we really want to encourage using bytes as paths? IIUC, anyone that wants to platform-independentify that code just needs to use proper strings (or pat glib) for paths everywhere, yes?
The problem is that bytes-as-paths actually *does* work for Mac OS X and systemd based Linux distros properly configured to use UTF-8 for OS interactions. This means that a lot of backend network service code makes that assumption, especially when it was originally written for Python 2, and rather than making it work properly on Windows, folks just drop Windows support as part of migrating to Python 3. At an ecosystem level, that means we're faced with a choice between implicitly encouraging folks to make their code *nix only, and finding a way to provide a more *nix like experience when running on Windows (where UTF-8 encoded binary data just works, and either other encodings lead to mojibake or else you use chardet to figure things out). Steve is suggesting that the latter option is preferable, a view I agree with since it lowers barriers to entry for Windows based developers to contribute to primarily *nix focused projects.
I understand that pre-surrogate-escape, there was a need for bytes paths, but those days are gone, yes?
No, UTF-8 encoded bytes are still the native language of network service development: http://utf8everywhere.org/ It also helps with cases where folks are switching back and forth between Python and other environments like JavaScript and Go where the UTF-8 assumption is more prevalent.
So why, at this late date, kludge what should be a deprecated pattern into the Windows build???
Promoting cross-platform consistency often leads to enabling patterns that are considered a bad idea from a native platform perspective, and this strikes me as an example of that (just as the binary/text separation itself is a case where Python 3 diverged from the POSIX text model to improve consistency across *nix, Windows, JVM and CLR environments). Cheers, Nick.