[Python-Dev] PEP 277 (unicode filenames): please review
Guido van Rossum
guido@python.org
Mon, 12 Aug 2002 16:07:46 -0400
> > http://www.python.org/peps/pep-0277.html
> >
> > The PEP describes a Windows-only change to Unicode in file names: On
> > Windows NT/2k/XP, Python would allow arbitrary Unicode strings as file
> > names and pass them to the OS, instead of converting them to CP_ACP
> > first. This applies to open() and all os functions that accept
> > filenames.
> >
> > In addition, os.list() would return Unicode filenames if the argument
> > is Unicode.
>
> This is the bit I still don't like (at least, if I'm not
> mistaken I commented on it a while ago too). A routine could be
> doing an os.list() expecting strings, but suddenly someone
> passes it a unicode directoryname and the return value would
> change.
Hm, that would be the responsibility of whoever passes it Unicode.
Most code works just fine when presented with Unicode where 8-bit
strings are expected. It's only code that assumes the 8-bit strings
are Latin-1 (or something else besides ASCII) that gets in trouble.
But shouldn't it return Unicode whenever there are filenames in the
directory that can't represented as ASCII?
That's what Tkinter does: Tk gives back UTF-8, which degenerates to
ASCII if there are only ASCII chars; if any high bits are detected,
Tkinter decodes the UTF-8, turning the return string into Unicode.
> I would much prefer an optional encoding argument whereby you
> give the encoding in which you want the return value. Default
> would be the local filesystem encoding. If you pass unicode you
> will get direct unicode on XP/2K, and a converted string on
> other platforms (but always unicode).
Hm, I don't know if I'd like os.listdir() to have an encoding
argument. Sounds like the wrong solution somehow.
> Oh yes, the same reasoning would hold for readlink(), getcwd()
> and any other call that returns filenames.
Ditto.
--Guido van Rossum (home page: http://www.python.org/~guido/)