[Python-ideas] Fix default encodings on Windows
Chris Barker - NOAA Federal
chris.barker at noaa.gov
Mon Aug 15 21:34:59 EDT 2016
> Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like:
>
> if os.name == 'nt':
> f = os.stat(os.listdir('.')[-1])
> else:
> f = os.stat(os.listdir(b'.')[-1])
REALLY? Do we really want to encourage using bytes as paths? IIUC,
anyone that wants to platform-independentify that code just needs to
use proper strings (or pat glib) for paths everywhere, yes?
I understand that pre-surrogate-escape, there was a need for bytes
paths, but those days are gone, yes?
So why, at this late date, kludge what should be a deprecated pattern
into the Windows build???
-CHB
> My proposal is to remove all use of the *A APIs and only use the *W APIs. That completely removes the (already deprecated) use of bytes as paths.
Yes, this is good.
> I then propose to change the (unused on Windows) sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into filesystem functions by transcoding into UTF-16 and calling the *W APIs.
I'm really not sure utf-8 is magic enough to do this. Where do you
imagine that utf-8 is coming from as bytes???
AIUI, while utf-8 is almost universal in *nix for file system names,
folks do not want to count on it -- hence the use of bytes. And it is
far less prevalent in the Windows world...
> , allows paths returned from the filesystem to correctly roundtrip via bytes in Python,
That you could do with native bytes (UTF-16, yes?)
> . But that would prevent basic manipulation which seems to be a higher priority.)
Still think Unicode is the answer to that...
> At this stage, it's time for us to either make byte paths an error,
+1. :-)
CHB
More information about the Python-ideas
mailing list