[Python-ideas] Fix default encodings on Windows
MRAB
python at mrabarnett.plus.com
Thu Aug 18 13:18:58 EDT 2016
On 2016-08-16 16:56, Steve Dower wrote:
> I just want to clearly address two points, since I feel like multiple
> posts have been unclear on them.
>
> 1. The bytes API was deprecated in 3.3 and it is listed in
> https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs
> is an unfortunate oversight, but it was certainly announced and the
> warning has been there for three released versions. We can freely change
> or remove the support now, IMHO.
>
> 2. Windows file system encoding is *always* UTF-16. There's no "assuming
> mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what
> encoding it is". We know exactly what the encoding is on every supported
> version of Windows. UTF-16.
>
> This discussion is for the developers who insist on using bytes for
> paths within Python, and the question is, "how do we best represent
> UTF-16 encoded paths in bytes?"
>
> The choices are:
>
> * don't represent them at all (remove bytes API)
> * convert and drop characters not in the (legacy) active code page
> * convert and fail on characters not in the (legacy) active code page
> * convert and fail on invalid surrogate pairs
> * represent them as UTF-16-LE in bytes (with embedded '\0' everywhere)
>
> Currently we have the second option.
>
> My preference is the fourth option, as it will cause the least breakage
> of existing code and enable the most amount of code to just work in the
> presence of non-ACP characters.
>
> The fifth option is the best for round-tripping within Windows APIs.
>
> The only code that will break with any change is code that was using an
> already deprecated API. Code that correctly uses str to represent
> "encoding agnostic text" is unaffected.
>
> If you see an alternative choice to those listed above, feel free to
> contribute it. Otherwise, can we focus the discussion on these (or any
> new) choices?
>
Could we use still call it 'mbcs', but use 'surrogateescape'?
More information about the Python-ideas
mailing list