[Python-Dev] Unicode and the Windows file system.

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 20 Mar 2001 00:16:34 +0100


> The way I see it, to fix this we have 2 basic choices when a Unicode object
> is passed as a filename:
>
> * we call the Unicode versions of the CRTL.

That is the choice that I prefer. I understand that it won't work on
Win95, but I think that needs to be worked-around.

By using "Unicode versions" of an API, you are making the code
Windows-specific anyway. So I wonder whether it might be better to use
the plain API instead of the CRTL; I also wonder how difficult it
actually is to do "the right thing all the time".

On NT, the file system is defined in terms of Unicode, so passing
Unicode in and out is definitely the right thing (*). On Win9x, the
file system uses some platform specific encoding, which means that
using that encoding is the right thing. On Unix, there is no
established convention, but UTF-8 was invented exactly to deal with
Unicode in Unix file systems, so that might be appropriate choice
(**).

So I'm in favour of supporting Unicode on all file system APIs; that
does include os.listdir(). For 2.1, that may be a bit much given that
a beta release has already been seen; so only accepting Unicode on
input is what we can do now.

Regards,
Martin

(*) Converting to the current MBCS might be lossy, and it might not
support all file names. The "ASCII only" approach of 2.0 was precisely
taken to allow getting it right later; I strongly discourage any
approach that attempts to drop the restriction in a way that does not
allow to get it right later.

(**) Atleast, that is the best bet. Many Unix installations use some
other encoding in their file names; if Unicode becomes more common,
most likely installations will also use UTF-8 on their file systems.
Unless it can be established what the file system encoding is,
returning Unicode from os.listdir is probably not the right thing.