unicode filenames

Piet van Oostrum piet at cs.uu.nl
Sun Feb 16 13:24:54 EST 2003


>>>>> David Eppstein <eppstein at ics.uci.edu> (DE) wrote:

DE> Under Mac OS X, the shell displays text (e.g. from cat, or from ls 
DE> without the -q option) as utf-8 by default, and the Finder (gui file 
DE> browser) uses utf-8 for accented characters in file names.  So I infer 
DE> that the correct interpretation of filenames under my OS is utf-8.
DE> But other unixes may differ...

On Mac OS X, it is a bit more complicated. First cat will indeed show the
unicode (utf-8) contents of a file, but ls won't display filenames with
non-ASCII characters right. At least not in 10.1.5. Maybe 10.2 does it better.
Like if my filename is "€200", ls will display "???200".

Secondly, the filesystem requires the unicode characters to be normalized,
which means that accented characters like "é" will be broken up into "e"
followed by "´". So if the finder has a file with name "é200", the bytes
used in the filename will be 0x65 followed by 0xCC 0x81 (unicode character
0x301). ls will print this as "e??200".

And in the shell I can't even type a € sign or é. That, however, is a
problem of the Terminal application, as I can do it in emacs.

Although ... aftre I tried it out, and wanted to send this article out, my
emacs crashed (fortunately after saving it).
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl




More information about the Python-list mailing list