Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like:
if os.name == 'nt': f = os.stat(os.listdir('.')[-1]) else: f = os.stat(os.listdir(b'.')[-1])
REALLY? Do we really want to encourage using bytes as paths? IIUC, anyone that wants to platform-independentify that code just needs to use proper strings (or pat glib) for paths everywhere, yes? I understand that pre-surrogate-escape, there was a need for bytes paths, but those days are gone, yes? So why, at this late date, kludge what should be a deprecated pattern into the Windows build??? -CHB
My proposal is to remove all use of the *A APIs and only use the *W APIs. That completely removes the (already deprecated) use of bytes as paths.
Yes, this is good.
I then propose to change the (unused on Windows) sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into filesystem functions by transcoding into UTF-16 and calling the *W APIs.
I'm really not sure utf-8 is magic enough to do this. Where do you imagine that utf-8 is coming from as bytes??? AIUI, while utf-8 is almost universal in *nix for file system names, folks do not want to count on it -- hence the use of bytes. And it is far less prevalent in the Windows world...
, allows paths returned from the filesystem to correctly roundtrip via bytes in Python,
That you could do with native bytes (UTF-16, yes?)
. But that would prevent basic manipulation which seems to be a higher priority.)
Still think Unicode is the answer to that...
At this stage, it's time for us to either make byte paths an error,
+1. :-) CHB