[Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

Nick Coghlan ncoghlan at gmail.com
Sat Sep 3 10:49:10 EDT 2016


On 2 September 2016 at 08:31, Steve Dower <steve.dower at python.org> wrote:
> This proposal would remove all use of the *A APIs and only ever call the *W
> APIs. When Windows returns paths to Python as str, they will be decoded from
> utf-16-le and returned as text (in whatever the minimal representation is).
> When
> Windows returns paths to Python as bytes, they will be decoded from
> utf-16-le to
> utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it
> is
> possible to have invalid surrogates in filenames). Equally, when paths are
> provided as bytes, they are decoded from utf-8 into utf-16-le and passed to
> the
> *W APIs.

The overall proposal looks good to me, there's just a terminology
glitch here: utf-8 <-> utf-16-le should either be described as
transcoding, or else as decoding and then re-encoding. As they're both
text codecs, there's no "decoding" operation that switches between
them.

As far as the timing of this particular change goes, I think you make
a good case that all of the cases that will see a behaviour change
with this PEP have already been receiving deprecation warnings since
3.3, which would make it acceptable to change the default behaviour in
3.6.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list