[Python-3000] Console encoding detection broken

Guido van Rossum guido at python.org
Fri Aug 10 19:26:13 CEST 2007


On 8/9/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Georg Brandl schrieb:
> > Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to
> > UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings.
>
> And not surprisingly so: io.py says
>
>         if encoding is None:
>             # XXX This is questionable
>             encoding = sys.getfilesystemencoding() or "latin-1"

Guilty as charged.

Alas, I don't know much about the machinery of console and filesystem
encodings, so I need help!

> First, at the point where this call is made, sys.getfilesystemencoding
> is still None,

What can we do about this? Set it earlier? It should really be set by
the time site.py is imported (which sets sys.stdin/out/err), as this
is the first time a lot of Python code is run that touches the
filesystem (e.g. sys.path mangling).

> plus the code is broken as getfilesystemencoding is not
> the correct value for sys.stdout.encoding. Instead, the way it should
> be computed is:
>
> 1. On Unix, use the same value that sys.getfilesystemencoding will get,
>    namely the result of nl_langinfo(CODESET); if that is not available,
>    fall back - to anything, but the most logical choices are UTF-8
>    (if you want output to always succeed) and ASCII (if you don't want
>    to risk mojibake).
> 2. On Windows, if output is to a terminal, use GetConsoleOutputCP.
>    Else fall back, probably to CP_ACP (ie. "mbcs")
> 3. On OSX, I don't know. If output is to a terminal, UTF-8 may be
>    a good bet (although some people operate their Terminal.apps
>    not in UTF-8; there is no way to find out). Otherwise, use the
>    locale's encoding - not sure how to find out what that is.

Feel free to add code that implements this. I suppose it would be a
good idea to have a separate function io.guess_console_encoding(...)
which takes some argument (perhaps a raw file?) and returns an
encoding name, never None. This could then be implemented by switching
on the platform into platform-specific functions and a default.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list