
On Sep 30, 2008, at 5:40 PM, Martin v. Löwis wrote:
On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
Since I've seen no objections to this yet: please no. If we offer a "lower-level" bytes filename API, it should work for all platforms.
Unfortunately, it can't. You cannot represent all possible file names in a byte string in Windows (just as you can't do so in a Unicode string on Unix).
As you mention in the parenthetical below, of course it can.
So using byte strings on Windows would work for some files, but fail for others. In particular, listdir might give you a list of file names which you then can't open/stat/recurse into.
(of course, you could use UTF-8 as the file system encoding on Windows, but then you will have to rewrite a lot of C code first)
Yes! If there is a byte-string access method for Windows, pretty please make it decode from UTF-8 internally and call the Unicode version of the Windows APIs. The non-unicode windows APIs are pretty much just broken -- Ideally, Python should never be calling those. But, I still don't like the idea of propagating the "sometimes a string, sometimes bytes" APIs...One or the other, please. Either always strings (if and only if a method for assuring decoding always succeeds), or always bytes. James