[Python-3000] Console encoding detection broken
Guido van Rossum
guido at python.org
Fri Aug 10 19:26:13 CEST 2007
On 8/9/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Georg Brandl schrieb:
> > Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to
> > UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings.
>
> And not surprisingly so: io.py says
>
> if encoding is None:
> # XXX This is questionable
> encoding = sys.getfilesystemencoding() or "latin-1"
Guilty as charged.
Alas, I don't know much about the machinery of console and filesystem
encodings, so I need help!
> First, at the point where this call is made, sys.getfilesystemencoding
> is still None,
What can we do about this? Set it earlier? It should really be set by
the time site.py is imported (which sets sys.stdin/out/err), as this
is the first time a lot of Python code is run that touches the
filesystem (e.g. sys.path mangling).
> plus the code is broken as getfilesystemencoding is not
> the correct value for sys.stdout.encoding. Instead, the way it should
> be computed is:
>
> 1. On Unix, use the same value that sys.getfilesystemencoding will get,
> namely the result of nl_langinfo(CODESET); if that is not available,
> fall back - to anything, but the most logical choices are UTF-8
> (if you want output to always succeed) and ASCII (if you don't want
> to risk mojibake).
> 2. On Windows, if output is to a terminal, use GetConsoleOutputCP.
> Else fall back, probably to CP_ACP (ie. "mbcs")
> 3. On OSX, I don't know. If output is to a terminal, UTF-8 may be
> a good bet (although some people operate their Terminal.apps
> not in UTF-8; there is no way to find out). Otherwise, use the
> locale's encoding - not sure how to find out what that is.
Feel free to add code that implements this. I suppose it would be a
good idea to have a separate function io.guess_console_encoding(...)
which takes some argument (perhaps a raw file?) and returns an
encoding name, never None. This could then be implemented by switching
on the platform into platform-specific functions and a default.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list