[Python-ideas] Fix default encodings on Windows
Steve Dower
steve.dower at python.org
Tue Aug 16 19:27:43 EDT 2016
On 16Aug2016 1603, Victor Stinner wrote:
> 2016-08-16 17:56 GMT+02:00 Steve Dower <steve.dower at python.org>:
>> 2. Windows file system encoding is *always* UTF-16. There's no "assuming
>> mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding
>> it is". We know exactly what the encoding is on every supported version of
>> Windows. UTF-16.
>
> I think that you missed a important issue (or "use case") which is
> called the "Makefile problem" by Mercurial developers:
> https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22
>
> I already explained it before, but maybe you misunderstood or just
> missed it, so here is a more concrete example.
I guess I misunderstood. The concrete example really help, thank you.
The problem here is that there is an application boundary without a
defined encoding, right where you put the comment.
> filenameb = os.listdir(b'.')[0]
> # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page
> # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8?
> print("filename bytes: %a" % filenameb)
>
> proc = subprocess.Popen(['py', '-2', script],
> stdin=subprocess.PIPE, stdout=subprocess.PIPE)
> stdout = proc.communicate(filenameb)[0]
> print("File content: %a" % stdout)
If you are defining the encoding as 'mbcs', then you need to check that
sys.getfilesystemencoding() == 'mbcs', and if it doesn't then reencode.
Alternatively, since this script is the "new" code, you would use
`os.listdir('.')[0].encode('mbcs')`, given that you have explicitly
determined that mbcs is the encoding for the later transfer.
Essentially, the problem is that this code is relying on a certain
non-guaranteed behaviour of a deprecated API, where using
sys.getfilesystemencoding() as documented would have prevented any issue
(see
https://docs.python.org/3/library/os.html#file-names-command-line-arguments-and-environment-variables).
In one of the emails I think you missed, I called this out as the only
case where code will break with a change to sys.getfilesystemencoding().
So yes, breaking existing code is something I would never do lightly.
However, I'm very much of the opinion that the only code that will break
is code that is already broken (or at least fragile) and that nobody is
forced to take a major upgrade to Python or should necessarily expect
100% compatibility between major versions.
Cheers,
Steve
More information about the Python-ideas
mailing list