[Python-Dev] Unicode filenames
Mark Hammond
mhammond@skippinet.com.au
Tue, 11 Feb 2003 08:45:45 +1100
I'm not sure I have followed this completely, but:
> (On platforms on which utf-8 is the file system encoding, yes.)
>
> > Passing 8bit
> > strings to that function should always go through that unicode API,
> > i.e. the should be treated as any other 8bit string in the unicode
> > context. This means it must be decoded from the default encoding.
The problem is that some file system related functions will return strings
*already in* the "file system encoding" - ie, on Windows, some functions
will return mbcs encoded filenames. Thus, there is a round-trip problem -
if you get a filename from os.listdir(), you could not pass it to open()
without lots of head-scratching.
The default file system encoding allows you to assume that 8 bit strings
passed to open are pre-encoded strings - ie, are likely to have previously
come directly from another API function.
IIRC, the current rules on Windows are:
* Pass a Unicode filename, and Python calls the Unicode versions of the API.
* Pass a string, and it is assumed the string is *already* in the default
file system encoding, so the string is ont re-encoded.
Mark.