New subject: Unicode strings as filenames

5 Jan 2002

      Recently, "M.-A. Lemburg"  said:
...
Jack Jansen wrote:
...
Off on a slight tangent:
On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles
this fine and so does open(). The OS does all the hard work for
you [...]
But in Python (unix-Python we're talking here, not MacPython),
unicode(filename) fails, because site.encoding is "ascii".
Would it be safe to set site.encoding to utf8 on Mac OS X by default?
I'd rather suggest to use UTF-8 as default encoding in the
subsystem layer I was talking about.
Uhm... Do you mean Py_FileSystemDefaultEncoding? Otherwise: what do
you mean? And, if you do mean Py_FSDE, would that also work for
listdir()? No, I guess it can't because listdir() returns simple
strings, so by the time I pass them to unicode() all knowledge that
they came from listdir is gone...

Hmm, shouldn't StringObjects themselves carry an encoding field
(defaulting to sys.encoding)? That would solve quite a few
issues. read() from a binary file would return the special encoding
"binary", for instance, and then the "u" and "u#" formats could make a
distinction between character strings (which would be converted to
unicode using the encoding they carry) and binary strings (which would
be interpreted as 16-bit chars). But interning may be a showstopper,
now that I think of it...
...
Making UTF-8 the default Python system encoding would have many other 
consequences -- and you'd probably lose a great deal of portability 
since UTF-8 conversion (nearly) always will succeed while ASCII can 
easily fail on other systems which use e.g. Latin-1 as native 
encoding.
What are your reasons for asserting this? If I read this correctly
this would make Python compatible to the least common denominator of
all platforms, while I think I would prefer it to allow access to all
the niceties a platform gives. On Unix you really don't have a good
guess for the encoding, but on MacOS and Windows you do...

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

Re: [Python-Dev] Unicode strings as filenames

Jack Jansen

Martin v. Loewis

tags

participants (2)