unicode filenames

David Eppstein eppstein at ics.uci.edu
Sun Feb 2 22:32:14 EST 2003


In article <3E3DC9AF.387E307A at alcyone.com>,
 Erik Max Francis <max at alcyone.com> wrote:

> > I normally use unix.  What's the right way to treat filenames
> > under that OS?  As Latin-1?  Or UTF-8?  As far as I can tell,
> > filenames are simply bytes, so I can make whatever interpretation
> > I want on the characters, and the standard viewpoint is to
> > interpret those characters as Latin-1.
> 
> I believe that's the most common interpretation, but as you say, it
> doesn't much matter since filenames in UNIX are just considered streams
> of bytes.  No reference to an encoding -- as far as I know -- is made in
> any UNIX-relevant standard.

Under Mac OS X, the shell displays text (e.g. from cat, or from ls 
without the -q option) as utf-8 by default, and the Finder (gui file 
browser) uses utf-8 for accented characters in file names.  So I infer 
that the correct interpretation of filenames under my OS is utf-8.
But other unixes may differ...

-- 
David Eppstein       UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/




More information about the Python-list mailing list