[Python-Dev] File system path encoding on Windows
Steve Dower
steve.dower at python.org
Mon Aug 29 23:29:21 EDT 2016
On 29Aug2016 1810, Nick Coghlan wrote:
> On 30 August 2016 at 08:38, Victor Stinner <victor.stinner at gmail.com> wrote:
>> Hi,
>>
>> tl; dr: just drop byte support and help developers to use Unicode in
>> their application!
>
> My view (and Steve's) is that this approach is likely to result in
> Linux-centric projects just dropping even nominal native Windows
> support, rather than more Python software that handles Unicode on
> Windows (/the CLR/the JVM) correctly.
Yeah, this basically sums it up. If I could be sure that the Python
developers who are 99% Linux/1% Windows (i.e. run unit tests once and
then release) weren't going to see dropping byte support completely as a
hostile action, I'd much rather go that way.
But let's definitely take note that platform-specific deprecation
warnings are probably not a good idea for cross-platform functionality.
> What Steve is proposing here is essentially a way of providing more
> *nix like CPython behaviour on Windows
Yep. What actually spurred me into action on this was a Twitter rant
from one of Twisted's developers about paths on Windows. So I presume
that Twisted is probably okay *now* (and hopefully because they
explicitly decode from network traffic into str before accessing the
file system...)
Using bytes has essentially always been using an arbitrarily-encoded str
on Windows. The active code page is not an equivalent of "give me the
path as raw bytes" as it is on POSIX, but my change will make it so that
it is. There'll be a performance penalty, but otherwise using bytes for
paths will become reliable.
Unfortunately, any implicitly-encoded cross-version interoperability
will have to be broken by such a change. There's just no way around it.
But I've seen no evidence that it's common, and there are two
workarounds available (set the environment variable, or change your code
to specify the encoding used).
> However, this view is also why I don't agree with being aggressive in
> making this behaviour the default on Windows - I think we should make
> it readily available as a provisional feature through a single
> cross-platform command line switch and environment setting (e.g. "-X
> utf8" and "PYTHONASSUMEUTF8") so folks that need it can readily opt in
> to it, but we can defer making it the default until 3.7 after folks
> have had a full release cycle's worth of experience with it in the
> wild.
Given the people who would need to opt-in to the behaviour are merely
the recipients of a library written by someone else, I don't think this
is the right approach. Stephen Turnbull in an earlier post referred to
organisations that fully control their systems in order to ensure that
the implicit encodings all match. These are also the people who can
apply an environment variable to avoid a behaviour change.
However, someone who just installed an HTTP library that was developed
on POSIX and perhaps not even tested on Windows should not have to flick
the switch themselves. In contrast, if it is known that 3.6 *definitely*
changed something here, we will certainly see more effort applied to
making sure libraries are updated. (Compare these two bug reports: "your
library breaks on Python 3.6" vs "your library breaks on Python 3.6 when
I set this environment variable". The fix for the latter is quite
reasonably going to be "don't do that".)
The other discussion about OpenSSL and LTS systems is also interesting.
Do we really expect users to take their fully functioning systems and
blindly upgrade to a new major version of Python expecting everything to
just work? That seems very unlikely to me, and also doesn't match my
experience (but I can't quantify that in any useful way, so take it as
you wish).
Cheers,
Steve
More information about the Python-Dev
mailing list