On 30Aug2016 1611, Victor Stinner wrote:
2016-08-30 23:51 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
As I already wrote once, my problem is also tjat I simply have no idea how much Python 3 code uses bytes filename. For example, does it concern more than 25% of py3 modules on PyPi, or less than 5%?
I made a very quick test on Windows using a modified Python raising an exception on bytes path.
First of all, setuptools fails. It's a kind of blocker issue :-) I quickly fixed it (only one line needs to be modified).
I tried to run Twisted unit tests (python -m twisted.trial twisted) of Twisted 16.4. I got a lot of exceptions on bytes path from the twisted/python/filepath.py module, but also from twisted/trial/util.py. It looks like these modules are doing their best to convert all paths to... bytes. I had to modify more than 5 methods just to be able to start running unit tests.
Quick result: setuptools and Twisted rely on bytes path. Dropping bytes path support on Windows breaks these modules.
It also means that these modules don't support the full Unicode range on Windows on Python 3.5.
Thanks. That's a good idea (certainly better than mine, which was to go reading code...) I haven't looked into setuptools, but Twisted appears to be correctly using sys.getfilesystemencoding() when they coerce to bytes, which means the proposed change will simply allow the full Unicode range when paths are encoded. However, if there are places where bytes are not transcoded when they should be *then* there will be new issues. I wonder if we can quickly test whether that happens (e.g. use the file system encoding to "taint" the path somehow - special prefix? - so we can raise if bytes that haven't been correctly encoded at some point are passed in). Some of my other searching revealed occasional correct use of sys.getfilesystemencoding(), a decent number of uses as a fallback when other encodings are not available, and it's very hard to search for code that uses the os module with bytes not checked to be the right encoding. This is why I argue that the beta period is the best opportunity to check, and why we're better to flip the switch now and flip it back if it all goes horribly wrong - the alternative is a *very* labour intensive exercise that I doubt we can muster.