print and unicode strings
Martin v. Loewis
martin at v.loewis.de
Wed Feb 20 02:09:44 EST 2002
"Jason Orendorff" <jason at jorendorff.com> writes:
> I think IDLE should do this by default. It's fine for
> Python to "refuse to guess" encodings, but in the case of Tk
> it's UTF-8 on all platforms... right?
Well, not exactly. Tkinter supports Unicode strings as primary data
type for text, with byte strings (e.g. UTF-8) only being the second
choice. So IDLE should set sys.stdout to a stream that accepts Unicode
objects as-is.
If you wonder how Tcl interprets byte strings: All bytes below 128 are
treated as ASCII. All byte sequences above 128 are considered as UTF-8
if possible. If there is an encoding error (because of illegal
sequences), they are treated with the platform's native encoding. This
is a pretty crude heuristics, since it means that you may end up with
a byte string that uses different encodings at different offsets.
Also, setting the stream for the console to cp437 is not ideal; it
would be much better if Python would use the Unicode console API
(i.e. WriteConsoleW) if it detects that sys.stdout is a console. If
you then use the Lucida Console font (instead of the raster font), you
might be even able to display COMET in a console.
Regards,
Martin
More information about the Python-list
mailing list