[Python-ideas] Fix default encodings on Windows
Steve Dower
steve.dower at python.org
Thu Aug 18 11:54:26 EDT 2016
On 18Aug2016 0829, Chris Angelico wrote:
> The second call to glob doesn't have any Unicode characters at all,
> the way I see it - it's all bytes. Am I completely misunderstanding
> this?
You're not the only one - I think this has been the most common
misunderstanding.
On Windows, the paths as stored in the filesystem are actually all text
- more precisely, utf-16-le encoded bytes, represented as 16-bit
characters strings.
Converting to an 8-bit character representation only exists for
compatibility with code written for other platforms (either Linux, or
much older versions of Windows). The operating system has one way to do
the conversion to bytes, which Python currently uses, but since we
control that transformation I'm proposing an alternative conversion that
is more reliable than compatible (with Windows 3.1... shouldn't affect
compatibility with code that properly handles multibyte encodings, which
should include anything developed for Linux in the last decade or two).
Does that help? I tried to keep the explanation short and focused :)
Cheers,
Steve
More information about the Python-ideas
mailing list