On 30Aug2016 0806, Victor Stinner wrote:
2016-08-30 16:31 GMT+02:00 Steve Dower <steve.dower@python.org>:
It's the random user on Windows who installed their library that has the problem. They don't know the fix, and may not know how to apply it (e.g. if it's their Jupyter notebook that won't find one of their files - no obvious command line options here).
There is already a DeprecationWarning. Sadly, it's hidden by default: you need a debug build of Python or more simply to pass -Wd command line option.
It also only appears on Windows, so developers who do the right thing on POSIX never find out about it. Your average user isn't going to see it - they'll just see the OSError when their file is not found due to the lossy encoding.
Maybe we should make this warning (Deprecation warning on bytes paths) visible by default, or add a new warning suggesting to enable -X utf8 the first time a Python function gets a byte string (like a filename)?
The more important thing in my opinion is to make it visible on all platforms, regardless of whether bytes paths are suitable or not. But this will probably be seen as hostile by the majority of open-source Python developers, which is why I'd rather just quietly fix the incompatibility.
Any system that requires communication between two different versions of Python must have install instructions (if it's public) or someone who maintains it. It won't magically break without an upgrade, and it should not get an upgrade without testing. The environment variable is available for this kind of scenario, though I'd hope the testing occurs during beta and it gets fixed by the time we release.
I disagree that breaking backward compatibility is worth it. Most users don't care of Unicode since their application already "just works well" for their use case.
Again, the problem is libraries (code written by someone else that you want to reuse), not applications (code written by you to solve your business problem in your environment). Code that assumes the default encodings are sufficient is already broken in the general case, and libraries nearly always need to cover the general case while applications do not. The stdlib needs to cover the general case, which is why I keep using open(os.listdir(b'.')[-1]) as an example of something that should never fail because of encoding issues. In theory, we should encourage library developers to support Windows properly by using str for paths, probably by disabling bytes paths everywhere. Alternatively, we make it so that bytes paths work fine everywhere and stop telling people that their code is wrong for a platform they're already not hugely concerned about.
Having to set an env var to "repair" their app to be able to upgrade Python is not really convenient.
Upgrading Python in an already running system isn't going to be really convenient anyway. Going from x.y.z to x.y.z+1 should be convenient, but from x.y to x.y+1 deserves testing and possibly code or environment changes. I don't understand why changing Python at the same time we change the version number is suddenly controversial. Cheers, Steve