[Python-Dev] Import and unicode: part two

Victor Stinner victor.stinner at haypocalc.com
Wed Jan 26 17:47:10 CET 2011


Le mercredi 26 janvier 2011 à 08:24 -0500, James Y Knight a écrit :
> On Jan 26, 2011, at 4:40 AM, Victor Stinner wrote:
> > During
> > Python 3.2 development, we tried to be able to use a filesystem encoding
> > different than the locale encoding (PYTHONFSENCODING environment
> > variable): but it doesn't work simply because Python is not alone in the
> > OS. Except Python, all programs speak the same "language": the locale
> > encoding. Let's try to give you an example: if create a module with a
> > name encoded to UTF-8, your file browser will display mojibake.
> 
> Is that really true? I'm pretty sure GTK+ treats all filenames as
> UTF-8 no matter what the locale says. (over-rideable by
> G_FILENAME_ENCODING or G_BROKEN_FILENAMES)

Not exactly. Gtk+ uses the glib library, and to encode/decode filenames,
the glib library uses:

 - UTF-8 on Windows
 - G_FILENAME_ENCODING environment variable if set (comma-separated list
of encodings)
 - UTF-8 if G_BROKEN_FILENAMES env var is set
 - or the locale encoding

glib has no type to store a filename, a filename is a raw byte string
(char*). It has a nice function to workaround mojibake issues:
g_filename_display_name(). This function tries to decode the filename
from each encoding of the filename encoding list, if all decodings
failed, use UTF-8 and escape undecodable bytes.

So yes, if you set G_FILENAME_ENCODING you can fix mojibake issues. But
you have to pass the raw bytes filenames to other libraries and
programs.

The problem with PYTHONFSENCODING is that sys.getfilesystemencoding() is
not only used for the filenames, but also for the command line arguments
and the environment variables.

For more information about glib, see g_filename_to_utf8(),
g_filename_display_name() and g_get_filename_charsets() documentation:

http://library.gnome.org/devel/glib/2.26/glib-Character-Set-Conversion.html

Victor



More information about the Python-Dev mailing list