paul at boddie.net
Mon Feb 3 12:35:54 CET 2003
Andrew Dalke <adalke at mindspring.com> wrote in message news:<b1kc9o$vf1$1 at slb9.atl.mindspring.net>...
> I normally use unix. What's the right way to treat filenames
> under that OS? As Latin-1? Or UTF-8? As far as I can tell,
> filenames are simply bytes, so I can make whatever interpretation
> I want on the characters, and the standard viewpoint is to
> interpret those characters as Latin-1.
It may be locale-based on Linux, at least, and possibly on other UNIX
> [dalke at zebulon src]$ ls sp* | od -c
> 0000000 s p å r v ä g e n \n
I hadn't heard of 'od' before, so this is a useful piece of
information. When accessing Red Hat Linux 7.3 on Intel with locale as
en_US.iso885915, I can apparently create filenames with ISO-8859-15
characters, and in the terminal program I'm using, these characters
appear as question marks when switching locale to en_US.utf8. However,
in the former locale, 'od -c' returns the characters as part of the
"dump", whereas in the latter, 'od -c' returns the octal codes for
What is interesting is that if I try to remove the file in UTF-8 mode,
it succeeds, even though the byte encoding of the filename should
really be different from what it was before. Moreover, if I create a
file with ISO-8859-15-encodable characters in UTF-8 mode, it seems to
use the ISO-8859-15 byte values.
Perhaps the "UTF-8 and Unicode FAQ" and the manual might be of help:
Still, I see your point about it being harder to use non-ASCII
characters in filenames on UNIX with the upcoming Python 2.3. In many
environments, this is a highly unsatisfactory situation.
More information about the Python-list