[Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
was PEP-time ? ...
Martin v. Loewis
martin@v.loewis.de
Wed, 16 Jan 2002 20:09:24 +0100
> > I'm still not certain what the meaning of this function is, if it
> > means "Unicode file names are only restricted by the file system
> > conventions", then on Unix, it may change at run-time, if a user or
> > the application sets an UTF-8 locale, switching from the original "C"
> > locale.
>
> Doesn't it mean: "posix functions and file() can accept Unicode file
> names" ?
Neil has given his own interpretation (return true if it is *better*
to pass Unicode strings than to pass byte strings).
You property (accepts Unicode) is true on all Python installations
since 2.2: if you pass a Unicode string, it will try the file system
encoding; if that is NULL, it will try the system encoding. So on all
Python systems,
open(u"foo.txt","w")
currently succeeds everywhere (unless Unicode was completely disabled
in the port).
> That's what I thought, at least; whether they succeed or not
> is another question and could well be handled by run-time errors
> (e.g. on Unix it is not at all clear whether NFS, Samba or some
> other more exotic file system can handle the encoding chosen by
> Python or the program).
For NFS, it is clear - file names are null-terminated byte strings
(AFAIK). For Samba, I believe it depends on the installation,
specifically whether the encoding of Samba matches the one of the
user. For more exotic file systems, it is not all that clear.
> Perhaps we ought to drop that function altogether and let the
> various file IO functions raise run-time errors instead ?!
That was my suggestion as well. However, Neil points out that, on
Windows, passing Unicode is sometimes better: For some files, there is
no byte string file name to identify the file (if the file name is not
representable in MBCS). OTOH, on Unix, some files cannot be accessed
with a Unicode string, if the file name is invalid in the user's
encoding.
It turns out that only OS X really got it right: For each file, there
is both a byte string name, and a Unicode name.
Regards,
Martin