[Python-Dev] File system path encoding on Windows

Mon Aug 29 23:29:21 EDT 2016

On 29Aug2016 1810, Nick Coghlan wrote:
> On 30 August 2016 at 08:38, Victor Stinner <victor.stinner at gmail.com> wrote:
>> Hi,
>>
>> tl; dr: just drop byte support and help developers to use Unicode in
>> their application!
>
> My view (and Steve's) is that this approach is likely to result in
> Linux-centric projects just dropping even nominal native Windows
> support, rather than more Python software that handles Unicode on
> Windows (/the CLR/the JVM) correctly.

Yeah, this basically sums it up. If I could be sure that the Python 
developers who are 99% Linux/1% Windows (i.e. run unit tests once and 
then release) weren't going to see dropping byte support completely as a 
hostile action, I'd much rather go that way.

But let's definitely take note that platform-specific deprecation 
warnings are probably not a good idea for cross-platform functionality.

> What Steve is proposing here is essentially a way of providing more
> *nix like CPython behaviour on Windows

Yep. What actually spurred me into action on this was a Twitter rant 
from one of Twisted's developers about paths on Windows. So I presume 
that Twisted is probably okay *now* (and hopefully because they 
explicitly decode from network traffic into str before accessing the 
file system...)

Using bytes has essentially always been using an arbitrarily-encoded str 
on Windows. The active code page is not an equivalent of "give me the 
path as raw bytes" as it is on POSIX, but my change will make it so that 
it is. There'll be a performance penalty, but otherwise using bytes for 
paths will become reliable.

Unfortunately, any implicitly-encoded cross-version interoperability 
will have to be broken by such a change. There's just no way around it. 
But I've seen no evidence that it's common, and there are two 
workarounds available (set the environment variable, or change your code 
to specify the encoding used).

> However, this view is also why I don't agree with being aggressive in
> making this behaviour the default on Windows - I think we should make
> it readily available as a provisional feature through a single
> cross-platform command line switch and environment setting (e.g. "-X
> utf8" and "PYTHONASSUMEUTF8") so folks that need it can readily opt in
> to it, but we can defer making it the default until 3.7 after folks
> have had a full release cycle's worth of experience with it in the
> wild.

Given the people who would need to opt-in to the behaviour are merely 
the recipients of a library written by someone else, I don't think this 
is the right approach. Stephen Turnbull in an earlier post referred to 
organisations that fully control their systems in order to ensure that 
the implicit encodings all match. These are also the people who can 
apply an environment variable to avoid a behaviour change.

However, someone who just installed an HTTP library that was developed 
on POSIX and perhaps not even tested on Windows should not have to flick 
the switch themselves. In contrast, if it is known that 3.6 *definitely* 
changed something here, we will certainly see more effort applied to 
making sure libraries are updated. (Compare these two bug reports: "your 
library breaks on Python 3.6" vs "your library breaks on Python 3.6 when 
I set this environment variable". The fix for the latter is quite 
reasonably going to be "don't do that".)

The other discussion about OpenSSL and LTS systems is also interesting. 
Do we really expect users to take their fully functioning systems and 
blindly upgrade to a new major version of Python expecting everything to 
just work? That seems very unlikely to me, and also doesn't match my 
experience (but I can't quantify that in any useful way, so take it as 
you wish).

Cheers,
Steve