Re: [Python-Dev] Unicode strings as filenames
Off on a slight tangent: On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles this fine and so does open(). The OS does all the hard work for you: it knows that some mounted disks may be in other 8-bit encodings (such as MacRoman or MacJapanese for old mac disks, or probably latin-1 for NFS filesystems, or god-knows-what for SMB mounted disks) and handles the conversion. But in Python (unix-Python we're talking here, not MacPython), unicode(filename) fails, because site.encoding is "ascii". Would it be safe to set site.encoding to utf8 on Mac OS X by default? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
Jack Jansen wrote:
Off on a slight tangent: On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles this fine and so does open(). The OS does all the hard work for you: it knows that some mounted disks may be in other 8-bit encodings (such as MacRoman or MacJapanese for old mac disks, or probably latin-1 for NFS filesystems, or god-knows-what for SMB mounted disks) and handles the conversion.
That's good news.
But in Python (unix-Python we're talking here, not MacPython), unicode(filename) fails, because site.encoding is "ascii".
Would it be safe to set site.encoding to utf8 on Mac OS X by default?
I'd rather suggest to use UTF-8 as default encoding in the subsystem layer I was talking about. Making UTF-8 the default Python system encoding would have many other consequences -- and you'd probably lose a great deal of portability since UTF-8 conversion (nearly) always will succeed while ASCII can easily fail on other systems which use e.g. Latin-1 as native encoding. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Would it be safe to set site.encoding to utf8 on Mac OS X by default?
As MAL explains, no. Instead, you should extend the fragment #if defined(MS_WIN32) && defined(HAVE_USABLE_WCHAR_T) const char *Py_FileSystemDefaultEncoding = "mbcs"; #else const char *Py_FileSystemDefaultEncoding = NULL; /* use default */ #endif to cover OSX as well, setting the string to "utf-8". Then, Unicode objects will be auto-converted to UTF-8 in open() and all posixmodule calls; not sure whether OSX uses posixmodule, though... Once you've done this, you should use es# specifiers with Py_FileSystemDefaultEncoding wherever you retrieve a file or path name from the application. Returning file names to the user is a different story, though: it may or may not be sensible to apply the file system encoding (if set) whenever file names are returned to the application (mostly in listdir). HTH, Martin
participants (3)
-
Jack Jansen
-
M.-A. Lemburg
-
Martin v. Loewis