[Python-Dev] Unicode strings as filenames

Martin v. Loewis martin@v.loewis.de
Sun, 6 Jan 2002 13:14:55 +0100


> > Now that you have that change, please try to extend it to
> > posixmodule.c. This is where I gave up.
> 
>    OK. os.open, os.stat, and os.listdir now work. Placed temporarily at
> http://pythoncard.sourceforge.net/posixmodule.c

Looks good. The posix_do_stat changes contain an error; you have put
Python API calls inside the BEGIN_ALLOW_THREADS block. That is wrong:
you must always hold the interpreter lock when calling Python
API. Also, when calling _wstati64, you might want to assert that the
function pointer is _stati64. Likewise, the code inside posix_open
should hold the interpreter lock.

> os.listdir returns Unicode objects rather than strings. This makes
> glob.glob work as well so my earlier script that finds the *.html
> files and opens them works. Unfortunately, I expect most callers of
> glob() will be expecting narrow strings.

That is not that much of a problem; we could try to define API where
it is the caller's choice.

However, the size of your changes is really disturbing here. There
used to be already four versions of listing a directory; now you've
added a fifth one. And it isn't even clear whether this code works on
W9x, is it?

There must be a way to fold the different Windows versions into a
single one; perhaps it is acceptable to drop Win16 support. I think
three different versions should be offered to the end user:
- path is plain string, result is list of plain strings
- path is Unicode string, result is list of Unicode strings
- path is Unicode string, result is list of plain strings

Perhaps one could argue that the third version isn't really needed:
anybody passing Unicode strings to listdir should be expected to get
them back also. That would leave us with two functional features on
windows. I envision a fragment that looks like this

#ifdef windows
  if (argument is unicode string) {
#define strings wide
#include "listdir_win.h"
#undef strings
  } else {
    convert argument to string
#define strings narrow
#include "listdir_win.h"
#undef strings
#endif

If you provide a similar listdir_posix and listdir_os2, it should be
possible to get a uniform implementation.

> > Notice that, with changing
> > Py_FileSystemDefaultEncoding and open() alone, you have worsened the
> > situation: os.stat will now fail on files with non-ASCII names on
> > which it works under the mbcs encoding, because windows won't find the
> > file (correct me if I'm wrong).
> 
>    If you give it a file name encoded in the current code page then it may
> fail where it did not before.

I was actually talking about stat as a function that you haven't
touched, yet. Now, os.rename will fail if you pass two Unicode strings
referring to non-ASCII file names. posix_1str and posix_2str are like
the stat implementation, except that you cannot know statically what
the function pointer is.

Regards,
Martin