
On Wed, 1 Oct 2008 09:21:37 am you wrote:
On Tue, Sep 30, 2008 at 4:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote:
On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
Since I've seen no objections to this yet: please no. If we offer a "lower-level" bytes filename API, it should work for all platforms.
Unfortunately, it can't. You cannot represent all possible file names in a byte string in Windows (just as you can't do so in a Unicode string on Unix).
Sorry, maybe I'm just being thick here, but I don't understand how that is possible. On the physical disk, each Windows file name must be represented by a byte string, yes? So how is it possible that there are Windows files with names that can't be represented as a byte string? What have I missed?
I believe on disk it uses UTF-16.
Which is made up of bytes. There may be byte sequences that are illegal UTF-16, but that's not what Martin said. I don't understand how there can be UTF-16 sequences which don't correspond to some sequence of bytes. How would they be represented in memory? Is this to do with the endianness of the UTF-16 sequence? -- Steven