Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

1 Oct 2008


      On Tue, Sep 30, 2008 at 7:04 PM, Steven D'Aprano  wrote:
...
...
I believe on disk it uses UTF-16.
Which is made up of bytes. There may be byte sequences that are illegal
UTF-16, but that's not what Martin said. I don't understand how there
can be UTF-16 sequences which don't correspond to some sequence of
bytes. How would they be represented in memory? Is this to do with the
endianness of the UTF-16 sequence?
It has to do with the internal mapping between the ANSI and Unicode
functions. On NT systems, CreateFileA will map the ANSI bytestring to
a Unicode filename via the active code page, and call CreateFileW
accordingly. The active code page cannot be set to something as useful
as UTF-8, so given any actual code page (1252, 932, etc.) there are
Unicode strings that cannot be represented with a bytestring provided
to the ANSI function.
-- 
Michael Urman