Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

9 Feb 2016

      On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker  wrote:
...
Just to clarify -- what does it currently do for bytes? IIUC, Windows uses
UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming
some Windows ANSI-compatible encoding? (and what does it return?)
UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI
codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a
bytes path that's passed to CreateFileA matches the listing from
FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not
roundtrip. Invalid byte sequences map to the default character. Note
that an ASCII question mark is not always the default character. It
depends on the codepage.

For example, in codepage 932 (Japanese), it's an error if a lead byte
(i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
uncommon). In this case the ANSI API substitutes the default character
for Japanese, '・' (U+30FB, Katakana middle dot).

    >>> locale.getpreferredencoding()
    'cp932'
    >>> open(b'\xe05', 'w').close()
    >>> os.listdir('.')
    ['・']
    >>> os.listdir(b'.')
    [b'\x81E']

All invalid sequences get mapped to '・', which roundtrips as
b'\x81\x45', so you can't reliably create and open files with
arbitrary bytes paths in this locale.

Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

eryk sun