[Python-ideas] Fix default encodings on Windows
Nick Coghlan
ncoghlan at gmail.com
Wed Aug 17 12:01:20 EDT 2016
On 17 August 2016 at 02:06, Chris Barker <chris.barker at noaa.gov> wrote:
> Just to make sure this is clear, the Pragmatic logic is thus:
>
> * There are more *nix-centric developers in the Python ecosystem than
> Windows-centric (or even Windows-agnostic) developers.
>
> * The bytes path approach works fine on *nix systems.
For the given value of "works fine" that is "works fine, except when
it doesn't, and then you end up with mojibake".
> * Whatever might be Right and Just -- the reality is that a number of
> projects, including important and widely used libraries and frameworks, use
> the bytes API for working with filenames and paths, etc.
>
> Therefore, there is a lot of code that does not work right on Windows.
>
> Currently, to get it to work right on Windows, you need to write Windows
> specific code, which many folks don't want or know how to do (or just can't
> support one way or the other).
>
> So the Solution is to either:
>
> (A) get everyone to use Unicode "properly", which will work on all
> platforms (but only on py3.5 and above?)
>
> or
>
> (B) kludge some *nix-compatible support for byte paths into Windows, that
> will work at least much of the time.
>
> It's clear (to me at least) that (A) it the "Right Thing", but real world
> experience has shown that it's unlikely to happen any time soon.
>
> Practicality beats Purity and all that -- this is a judgment call.
>
> Have I got that right?
Yep, pretty much. Based on Stephen Turnbull's concerns, I wonder if we
could make a whitelist of universal encodings that Python-on-Windows
will use in preference to UTF-8 if they're configured as the current
code page. If we accepted GB18030, GB2312, Shift-JIS, and ISO-2022-*
as overrides, then problems would be significantly less likely.
Another alternative would be to apply a similar solution as we do on
Linux with regards to the "surrogateescape" error handler: there are
some interfaces (like the standard streams) where we only enable that
error handler specifically if the preferred encoding is reported as
ASCII. In 2016, we're *very* skeptical about any properly configured
system actually being ASCII-only (rather than that value showing up
because the POSIX standards mandate it as the default), so we don't
really believe the OS when it tells us that.
The equivalent for Windows would be to disbelieve the configured code
page only when it was reported as "mbcs" - for folks that had
configured their system to use something other than the default,
Python would believe them, just as we do on Linux.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-ideas
mailing list