print and unicode strings

Martin v. Loewis martin at v.loewis.de
Wed Feb 20 02:09:44 EST 2002


"Jason Orendorff" <jason at jorendorff.com> writes:

> I think IDLE should do this by default.  It's fine for
> Python to "refuse to guess" encodings, but in the case of Tk
> it's UTF-8 on all platforms... right?

Well, not exactly. Tkinter supports Unicode strings as primary data
type for text, with byte strings (e.g. UTF-8) only being the second
choice. So IDLE should set sys.stdout to a stream that accepts Unicode
objects as-is.

If you wonder how Tcl interprets byte strings: All bytes below 128 are
treated as ASCII. All byte sequences above 128 are considered as UTF-8
if possible. If there is an encoding error (because of illegal
sequences), they are treated with the platform's native encoding. This
is a pretty crude heuristics, since it means that you may end up with
a byte string that  uses different encodings at different offsets.

Also, setting the stream for the console to cp437 is not ideal; it
would be much better if Python would use the Unicode console API
(i.e. WriteConsoleW) if it detects that sys.stdout is a console.  If
you then use the Lucida Console font (instead of the raster font), you
might be even able to display COMET in a console.

Regards,
Martin



More information about the Python-list mailing list