Re: [Python-ideas] Fix default encodings on Windows

Steve Dower

16 Aug 2016 16 Aug '16

6:27 p.m.

On 16Aug2016 1603, Victor Stinner wrote:

2016-08-16 17:56 GMT+02:00 Steve Dower <steve.dower@python.org>:

...
2. Windows file system encoding is *always* UTF-16. There's no "assuming mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding it is". We know exactly what the encoding is on every supported version of Windows. UTF-16.

I think that you missed a important issue (or "use case") which is called the "Makefile problem" by Mercurial developers: https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem....

I already explained it before, but maybe you misunderstood or just missed it, so here is a more concrete example.

I guess I misunderstood. The concrete example really help, thank you. The problem here is that there is an application boundary without a defined encoding, right where you put the comment.

...

filenameb = os.listdir(b'.')[0] # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? print("filename bytes: %a" % filenameb)

proc = subprocess.Popen(['py', '-2', script], stdin=subprocess.PIPE, stdout=subprocess.PIPE) stdout = proc.communicate(filenameb)[0] print("File content: %a" % stdout)

If you are defining the encoding as 'mbcs', then you need to check that sys.getfilesystemencoding() == 'mbcs', and if it doesn't then reencode. Alternatively, since this script is the "new" code, you would use `os.listdir('.')[0].encode('mbcs')`, given that you have explicitly determined that mbcs is the encoding for the later transfer. Essentially, the problem is that this code is relying on a certain non-guaranteed behaviour of a deprecated API, where using sys.getfilesystemencoding() as documented would have prevented any issue (see https://docs.python.org/3/library/os.html#file-names-command-line-arguments-...). In one of the emails I think you missed, I called this out as the only case where code will break with a change to sys.getfilesystemencoding(). So yes, breaking existing code is something I would never do lightly. However, I'm very much of the opinion that the only code that will break is code that is already broken (or at least fragile) and that nobody is forced to take a major upgrade to Python or should necessarily expect 100% compatibility between major versions. Cheers, Steve

Back to the thread

Back to the list