unicode filenames
Martin v. Löwis
martin at v.loewis.de
Sun Mar 2 06:58:36 EST 2003
Andrew Dalke <adalke at mindspring.com> writes:
> Okay, so it seems like no one knows how to handle unicode filenames
> under Unix. Perhaps the following is the proper behaviour?
"Unix" is a too wide term here. Different *installations* of the very
same software product may use different means to represent non-ASCII
characters in file names (even different directories in the same
installation); it is all convention how to interpret them. Python is
somewhat at a loss in guessing the "right" thing.
The emerging convention is that the locale's codeset determines the
encoding of file names. This convention is used in a number of Linux
distributions, and other Unices.
> 1) there is a default filesystem encoding, which is initialized
> to None if os.path.supports_unicode_file is True, otherwise
> it's set to sys.getdefaultencoding()
Since Python 2.2 (I believe), invoking locale.setlocale will set the
file system default encoding to what the system's nl_langinfo(CODESET)
returns - provided the system has both nl_langinfo and CODESET.
> 2) there is a registration system which is used to define encodings
> used for different mount locations. If a filename/dirname is
> not covered, sue the default filesystem encoding
Ok, I'll sue :-)
Such a scenario should not be supported. The encoding should be
uniform in all components of a path, and it is the system
administrator's task to make sure this is the case.
> If this makes sense, should it be added to Python's core?
Not in the way you have described it. Because Unix is tricky (and NT+
is much more advanced) in this respect, the existing PEP deliberately
targets NT+ only, leaving Unix for further study.
Regards,
Martin
More information about the Python-list
mailing list