[Python-Dev] Unicode strings as filenames

Martin v. Loewis martin@v.loewis.de
Sun, 6 Jan 2002 01:33:08 +0100


>    This change works for me on Windows 2000 and allows access to all files
> no matter what the current code page is set to. On Windows 9x (not yet
> tested), the _wfopen call should fail causing a fallback to fopen. Possibly
> the OS should be detected instead and _wfopen not attempted on 9x. 

Now that you have that change, please try to extend it to
posixmodule.c. This is where I gave up. Notice that, with changing
Py_FileSystemDefaultEncoding and open() alone, you have worsened the
situation: os.stat will now fail on files with non-ASCII names on
which it works under the mbcs encoding, because windows won't find the
file (correct me if I'm wrong).

> On 9x, mbcs may be a better choice of encoding although it may also
> be possible to ask the file system to find the wide character file
> name and return the mangled short name that can then be used by
> fopen.

It is not just 9x: if you have ten (*) different APIs to open a file, 10
different APIs to stat a file, and so on, and have to select some of
them at compile time, and some of them at run-time, it gets messy very
quickly.

(*) I'd expect that other systems may also have proprietary system
calls to do these things, using either wchar_t* or a proprietary
Unicode type.

>    The best approach to me seems to be to make
> Py_FileSystemDefaultEncoding settable by the user, at least allowing
> the choice between 'utf-8' and 'mbcs' with a default of 'utf-8' on
> NT and 'mbcs' on 9x.

By the user, or by the application? How can the application make a
more educated guess than Python proper? Alternatively, how can the
user (or her Administrator) know what value to put in there?

On Windows, probably neither is a good idea; if the file system
default encoding is used in the future, fixing it at mbcs is the best
I can think of.

>    Please criticise any stylistic or correctness issues in the code
> as it is my first modification to the Python sources.

The code looks fine. I'd encourage you to continue on that topic; just
expect that it will need many more rounds for completion.

Regards,
Martin