unicode filenames
Just
just at xs4all.nl
Sun Feb 16 16:09:41 EST 2003
In article <wzptps13t5.fsf at nono.cs.uu.nl>,
Piet van Oostrum <piet at cs.uu.nl> wrote:
> >>>>> David Eppstein <eppstein at ics.uci.edu> (DE) wrote:
>
> DE> Under Mac OS X, the shell displays text (e.g. from cat, or from ls
> DE> without the -q option) as utf-8 by default, and the Finder (gui file
> DE> browser) uses utf-8 for accented characters in file names. So I infer
> DE> that the correct interpretation of filenames under my OS is utf-8.
> DE> But other unixes may differ...
>
> On Mac OS X, it is a bit more complicated. First cat will indeed show the
> unicode (utf-8) contents of a file, but ls won't display filenames with
> non-ASCII characters right. At least not in 10.1.5. Maybe 10.2 does it better.
> Like if my filename is "¤200", ls will display "???200".
Although in Terminal.app supports utf-8 in 10.2, what you describe is
still true.
> Secondly, the filesystem requires the unicode characters to be normalized,
> which means that accented characters like "é" will be broken up into "e"
> followed by "´". So if the finder has a file with name "é200", the bytes
> used in the filename will be 0x65 followed by 0xCC 0x81 (unicode character
> 0x301). ls will print this as "e??200".
You don't have to worry about that: the file system will _give_ you
normalized unicode, but it does the right thing if you feed it
non-normalized unicode.
Btw. in 2.3 (current CVS, not a1), the file system calls fully support
unicode strings on OSX. I've also got a patch pending that makes
os.listdir() return unicode strings when appropriate:
http://python.org/sf/683592. I think this has a fair chance to make it
in.
Just
More information about the Python-list
mailing list