[Python-Dev] Unicode strings as filenames

Jack Jansen jack@oratrix.nl
Sun, 06 Jan 2002 00:05:57 +0100


Recently, "M.-A. Lemburg" <mal@lemburg.com> said:
> Jack Jansen wrote:
> > 
> > Off on a slight tangent:
> > On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles
> > this fine and so does open(). The OS does all the hard work for
> > you [...]
> > But in Python (unix-Python we're talking here, not MacPython),
> > unicode(filename) fails, because site.encoding is "ascii".
> > 
> > Would it be safe to set site.encoding to utf8 on Mac OS X by default?
> 
> I'd rather suggest to use UTF-8 as default encoding in the
> subsystem layer I was talking about. 

Uhm... Do you mean Py_FileSystemDefaultEncoding? Otherwise: what do
you mean? And, if you do mean Py_FSDE, would that also work for
listdir()? No, I guess it can't because listdir() returns simple
strings, so by the time I pass them to unicode() all knowledge that
they came from listdir is gone...

Hmm, shouldn't StringObjects themselves carry an encoding field
(defaulting to sys.encoding)? That would solve quite a few
issues. read() from a binary file would return the special encoding
"binary", for instance, and then the "u" and "u#" formats could make a
distinction between character strings (which would be converted to
unicode using the encoding they carry) and binary strings (which would
be interpreted as 16-bit chars). But interning may be a showstopper,
now that I think of it...

> Making UTF-8 the default Python system encoding would have many other 
> consequences -- and you'd probably lose a great deal of portability 
> since UTF-8 conversion (nearly) always will succeed while ASCII can 
> easily fail on other systems which use e.g. Latin-1 as native 
> encoding.

What are your reasons for asserting this? If I read this correctly
this would make Python compatible to the least common denominator of
all platforms, while I think I would prefer it to allow access to all
the niceties a platform gives. On Unix you really don't have a good
guess for the encoding, but on MacOS and Windows you do...

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm