[Python-Dev] Import and unicode: part two
James Y Knight
foom at fuhm.net
Wed Jan 26 19:25:49 CET 2011
On Jan 26, 2011, at 11:47 AM, Victor Stinner wrote:
> Not exactly. Gtk+ uses the glib library, and to encode/decode filenames,
> the glib library uses:
> - UTF-8 on Windows
> - G_FILENAME_ENCODING environment variable if set (comma-separated list
> of encodings)
> - UTF-8 if G_BROKEN_FILENAMES env var is set
> - or the locale encoding
But the documentation says:
> On Unix, the character sets are determined by consulting the environment variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. On Windows, the character set used in the GLib API is always UTF-8 and said environment variables have no effect.
> G_FILENAME_ENCODING may be set to a comma-separated list of character set names. The special token "@locale" is taken to mean the character set for thecurrent locale. If G_FILENAME_ENCODING is not set, but G_BROKEN_FILENAMES is, the character set of the current locale is taken as the filename encoding. If neither environment variable is set, UTF-8 is taken as the filename encoding, but the character set of the current locale is also put in the list of encodings.
Which indicates to me that (unless you override the behavior with env vars) it encodes filenames in UTF-8 regardless of the locale, and attempts decoding in UTF-8 primarily. And that only when the filename doesn't make sense in UTF-8, it will also try decoding it in the locale encoding.
More information about the Python-Dev