[Python-Dev] Unicode and the Windows file system.

M.-A. Lemburg mal@lemburg.com
Mon, 19 Mar 2001 11:09:49 +0100


Mark Hammond wrote:
> 
> I understand the issue of "default Unicode encoding" is a loaded one,
> however I believe with the Windows' file system we may be able to use a
> default.
> 
> Windows provides 2 versions of many functions that accept "strings" - one
> that uses "char *" arguments, and another using "wchar *" for Unicode.
> Interestingly, the "char *" versions of function almost always support
> "mbcs" encoded strings.
> 
> To make Python work nicely with the file system, we really should handle
> Unicode characters somehow.  It is not too uncommon to find the "program
> files" or the "user" directory have Unicode characters in non-english
> version of Win2k.
> 
> The way I see it, to fix this we have 2 basic choices when a Unicode object
> is passed as a filename:
> * we call the Unicode versions of the CRTL.
> * we auto-encode using the "mbcs" encoding, and still call the non-Unicode
> versions of the CRTL.
> 
> The first option has a problem in that determining what Unicode support
> Windows 95/98 have may be more trouble than it is worth.  Sticking to purely
> ascii versions of the functions means that the worst thing that can happen
> is we get a regular file-system error if an mbcs encoded string is passed on
> a non-Unicode platform.
> 
> Does anyone have any objections to this scheme or see any drawbacks in it?
> If not, I'll knock up a patch...

Hmm... the problem with MBCS is that it is not one encoding,
but can be many things. I don't know if this is an issue (can there
be more than one encoding per process ? is the encoding a user or
system setting ? does the CRT know which encoding to use/assume ?),
but the Unicode approach sure sounds a lot safer.

Also, what would os.listdir() return ? Unicode strings or 8-bit
strings ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Pages:                           http://www.lemburg.com/python/