[Python-Dev] Windows: Remove support of bytes filenames in theos module?

Steve Dower python at stevedower.id.au
Tue Feb 9 21:42:45 EST 2016


On 09Feb2016 1801, Andrew Barnert wrote:
> On Feb 9, 2016, at 17:37, Steve Dower <python at stevedower.id.au
> <mailto:python at stevedower.id.au>> wrote:
>
>> Could we perhaps redefine bytes paths on Windows as utf8 and use
>> Unicode everywhere internally?
>
> When you receive bytes from argv, stdin, a text file, a GUI, a named
> pipe, etc., and then use them as a path, Python treating them as UTF-8
> would break everything.

Sure, but that's already broken today if you're communicating bytes via 
some protocol without manually managing the encoding, in which case you 
should be decoding it (and potentially re-encoding to 
sys.getfilesystemencoding()).

The problem here is the protocol that Python uses to return bytes paths, 
and that protocol is inconsistent between APIs and information is lost. 
It really requires going through all the OS calls and either (a) making 
them consistently decode bytes to str using the declared FS encoding 
(currently 'mbcs', but I see no reason we can't make it 'utf_8'), or (b) 
make them consistently use the user's current system locale setting by 
always using the *A Win32 APIs rather than the *W ones.

>> I really don't like the idea of not being able to use bytes in cross
>> platform code. Unless it's become feasible to use Unicode for lossless
>> filenames on Linux - last I heard it wasn't.
>
> It is, and has been for years. Surrogate escaping solved the linux
> problem. That doesn't help for Python 2, but again, it's too late for
> Python 2.

Okay, guess I was operating out of old information. Thanks (and thanks 
Chris for the same answer).


More information about the Python-Dev mailing list