[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

Michael Urman murman at gmail.com
Wed Oct 1 02:16:19 CEST 2008


On Tue, Sep 30, 2008 at 7:04 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> I believe on disk it uses UTF-16.
>
> Which is made up of bytes. There may be byte sequences that are illegal
> UTF-16, but that's not what Martin said. I don't understand how there
> can be UTF-16 sequences which don't correspond to some sequence of
> bytes. How would they be represented in memory? Is this to do with the
> endianness of the UTF-16 sequence?

It has to do with the internal mapping between the ANSI and Unicode
functions. On NT systems, CreateFileA will map the ANSI bytestring to
a Unicode filename via the active code page, and call CreateFileW
accordingly. The active code page cannot be set to something as useful
as UTF-8, so given any actual code page (1252, 932, etc.) there are
Unicode strings that cannot be represented with a bytestring provided
to the ANSI function.
-- 
Michael Urman


More information about the Python-Dev mailing list