[Python-Dev] File system path encoding on Windows

Victor Stinner victor.stinner at gmail.com
Mon Aug 29 19:14:55 EDT 2016


2016-08-20 21:31 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:
> Reading your summary meant this finally clicked with something Victor
> has been considering for a while: a "Force UTF-8" switch that told
> Python to ignore the locale encoding on Linux, and instead assume
> UTF-8 everywhere (command line parameter parsing, environment variable
> processing, filesystem encoding, standard streams, etc)
>
> It's essentially the same problem you have on Windows, just with
> slightly different symptoms and consequences.

Yes and no, but more no than yes :-)

On Linux, the issue is quite simple: most major Linux distributions
switched to UTF-8 by default, network shares use UTF-8, filenames are
stored as UTF-8, applications expect UTF-8, etc. I proposed once a "-X
utf8" switch, but more as a convenient workaround for badly configured
system which encode data to UTF-8, but the locale encoding is not
properly configured *in some cases*. The switch does a single thing:
ignore the locale encoding, and force UTF-8 as the locale encoding.

Steve's proposition is specific to Windows, and Windows is a different
world. On Windows, there is one unique distribution: the Microsoft
flavor, and UTF-8 was and is *never* used as the ANSI code page (which
is more and less the same thing that UNIX locale encoding). Using
UTF-8 is something new, not really common in the Windows world. Steve
said that UTF-8 is common in the .NET (but I don't know well Windows
community/universe).

I proposed to Steve to work on an unified "-X utf8" option to
explicitly force UTF-8 on Linux and Windows. But Steve looks to prefer
to force UTF-8 *by default*, but add a new option to revert the old
behaviour.

I proposed the idea, but I'm not sure that we can have a single option
for Linux and Windows. Moreover, I never really worked on trying to
implement "-X utf8" on Linux, because it looks like the "misconfigured
system" are less and less common nowadays. I see very few user
requests in this direction.

By the way, except Steve, did someone complain about the ANSI code
page for bytes on Windows in Python? I recall one or two issues last 5
years about the os.listdir(bytes) issue, but these issues were
specific to Python 2 if I recall correctly?

Victor


More information about the Python-Dev mailing list