[Python-Dev] File system path encoding on Windows

Steve Dower steve.dower at python.org
Tue Aug 30 14:04:45 EDT 2016


On 30Aug2016 0806, Victor Stinner wrote:
> 2016-08-30 16:31 GMT+02:00 Steve Dower <steve.dower at python.org>:
>> It's the
>> random user on Windows who installed their library that has the problem.
>> They don't know the fix, and may not know how to apply it (e.g. if it's
>> their Jupyter notebook that won't find one of their files - no obvious
>> command line options here).
>
> There is already a DeprecationWarning. Sadly, it's hidden by default:
> you need a debug build of Python or more simply to pass -Wd command
> line option.

It also only appears on Windows, so developers who do the right thing on 
POSIX never find out about it. Your average user isn't going to see it - 
they'll just see the OSError when their file is not found due to the 
lossy encoding.

> Maybe we should make this warning (Deprecation warning on bytes paths)
> visible by default, or add a new warning suggesting to enable -X utf8
> the first time a Python function gets a byte string (like a filename)?

The more important thing in my opinion is to make it visible on all 
platforms, regardless of whether bytes paths are suitable or not. But 
this will probably be seen as hostile by the majority of open-source 
Python developers, which is why I'd rather just quietly fix the 
incompatibility.

>> Any system that requires communication between two different versions of
>> Python must have install instructions (if it's public) or someone who
>> maintains it. It won't magically break without an upgrade, and it should not
>> get an upgrade without testing. The environment variable is available for
>> this kind of scenario, though I'd hope the testing occurs during beta and it
>> gets fixed by the time we release.
>
> I disagree that breaking backward compatibility is worth it. Most
> users don't care of Unicode since their application already "just
> works well" for their use case.

Again, the problem is libraries (code written by someone else that you 
want to reuse), not applications (code written by you to solve your 
business problem in your environment). Code that assumes the default 
encodings are sufficient is already broken in the general case, and 
libraries nearly always need to cover the general case while 
applications do not. The stdlib needs to cover the general case, which 
is why I keep using open(os.listdir(b'.')[-1]) as an example of 
something that should never fail because of encoding issues.

In theory, we should encourage library developers to support Windows 
properly by using str for paths, probably by disabling bytes paths 
everywhere. Alternatively, we make it so that bytes paths work fine 
everywhere and stop telling people that their code is wrong for a 
platform they're already not hugely concerned about.

> Having to set an env var to "repair" their app to be able to upgrade
> Python is not really convenient.

Upgrading Python in an already running system isn't going to be really 
convenient anyway. Going from x.y.z to x.y.z+1 should be convenient, but 
from x.y to x.y+1 deserves testing and possibly code or environment 
changes. I don't understand why changing Python at the same time we 
change the version number is suddenly controversial.

Cheers,
Steve



More information about the Python-Dev mailing list