[Python-ideas] Fix default encodings on Windows

Chris Barker - NOAA Federal chris.barker at noaa.gov
Mon Aug 15 21:34:59 EDT 2016


> Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like:
>
> if os.name == 'nt':
>    f = os.stat(os.listdir('.')[-1])
> else:
>    f = os.stat(os.listdir(b'.')[-1])

REALLY? Do we really want to encourage using bytes as paths? IIUC,
anyone that wants to platform-independentify that code just needs to
use proper strings (or pat glib) for paths everywhere, yes?

I understand that pre-surrogate-escape, there was a need for bytes
paths, but those days are gone, yes?

So why, at this late date, kludge what should be a deprecated pattern
into the Windows build???

-CHB

> My proposal is to remove all use of the *A APIs and only use the *W APIs. That completely removes the (already deprecated) use of bytes as paths.

Yes, this is good.

> I then propose to change the (unused on Windows) sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into filesystem functions by transcoding into UTF-16 and calling the *W APIs.

I'm really not sure utf-8 is magic enough to do this. Where do you
imagine that utf-8 is coming from as bytes???

AIUI, while utf-8 is almost universal in *nix for file system names,
folks do not want to count on it -- hence the use of bytes. And it is
far less prevalent in the Windows world...

> , allows paths returned from the filesystem to correctly roundtrip via bytes in Python,

That you could do with native bytes (UTF-16, yes?)

> . But that would prevent basic manipulation which seems to be a higher priority.)

Still think Unicode is the answer to that...

> At this stage, it's time for us to either make byte paths an error,

+1.  :-)

CHB


More information about the Python-ideas mailing list