[Python-Dev] Unicode and the Windows file system.
Guido van Rossum
guido@digicool.com
Mon, 19 Mar 2001 08:12:58 -0500
> > Also, what would os.listdir() return ? Unicode strings or 8-bit
> > strings ?
>
> This would not change.
>
> This is what my testing shows:
>
> * I can switch to a German locale, and create a file using the keystrokes
> "`atest`o". The "`" is the dead-char so I get an umlaut over the first and
> last characters.
(Actually, grave accents, but I'm sure that to Aussie eyes, as to
Americans, they's all Greek. :-)
> * os.listdir() returns '\xe0test\xf2' for this file.
I don't understand. This is a Latin-1 string. Can you explain again
how the MBCS encoding encodes characters outside the Latin-1 range?
> * That same string can be passed to "open" etc to open the file.
>
> * The only way to get that string to a Unicode object is to use the
> encodings "Latin1" or "mbcs". Of them, "mbcs" would have to be safer, as at
> least it has a hope of handling non-latin characters :)
>
> So - assume I am passed a Unicode object that represents this filename. At
> the moment we simply throw that exception if we pass that Unicode object to
> open(). I am proposing that "mbcs" be used in this case instead of the
> default "ascii"
>
> If nothing else, my idea could be considered a "short-term" solution. If
> ever it is found to be a problem, we can simply move to the unicode APIs,
> and nothing would break - just possibly more things _would_ work :)
I have one more question. The plan looks decent, but I don't know the
scope. Which calls do you plan to fix?
--Guido van Rossum (home page: http://www.python.org/~guido/)