Bug #112265: Tkinter seems to treat everything as Latin 1
summary: Tkinter passes 8-bit strings to Tk without any preprocessing. Tk itself expects UTF-8, but passes bogus UTF-8 data right through... or in other words, Tkinter treats any 8-bit string that doesn't contain valid UTF-8 as an ISO Latin 1 string... ::: maybe Tkinter should raise a UnicodeError instead (just like string comparisions etc). example: w = Label(text="<cp1250 string>") UnicodeError: ASCII decoding error: ordinal not in range(128) this will break existing code, but I think that's better than confusing the hell out of anyone working on a non-Latin-1 platform... +0 from myself -- there's no way we can get a +1 solution (source encoding) into 2.0 without delaying the release... ::: for some more background, see the bug report below, and my followup. </F> --- Summary: Impossible to get Win32 default font encoding in widgets Details: I did not managed to obtain correct font encoding in widgets on Win32 (NT Workstation, Polish version, default encoding cp1250). All cp1250 Polish characters were displayed incorrectly. I think, all characters that do not belong to Latin-1 will be displayed incorrectly. Regarding Python1.6b1, I checked the Tcl/Tk installation (8.3.2). The pure Tcl/Tk programs DO display characters in cp1250 correctly. As far as I know, the Tcl interpreter woks with UTF-8 encoded strings. Does Python1.6b1 really know about it? --- Follow-Ups: Date: 2000-Aug-26 08:04 By: effbot Comment: this is really a "how do I", rather than a bug report ;-) ::: In 1.6 and beyond, Python's default 8-bit encoding is plain ASCII. this encoding is only used when you're using 8-bit strings in "unicode contexts" -- for example, if you compare an 8-bit string to a unicode string, or pass it to a subsystem designed to use unicode strings. If you pass an 8-bit string containing characters outside the ASCII range to a function expecting a unicode string, the result is undefined (it's usually results in an exception, but some subsystems may have other ideas). Finally, Tkinter now supports Unicode. In fact, it assumes that all strings passed to it are Unicode. When using 8-bit strings, it's only safe to use plain ASCII. Tkinter currently doesn't raise exceptions for 8-bit strings with non-ASCII characters, but it probably should. Otherwise, Tk will attempt to parse the string as an UTF-8 string, and if that fails, it assumes ISO-8859-1. ::: Anyway, to write portable code using characters outside the ASCII character set, you should use unicode strings. in your case, you can use: s = unicode("<a cp1250 string>", "cp1250") to get the platform's default encoding, you can do: import locale language, encoding = locale.getdefaultlocale() where encoding should be "cp1250" on your box. ::: The reason this work under Tcl/Tk is that Tcl assumes that your source code uses the platform's default encoding, and converts things to Unicode (not necessarily UTF-8) for you under the hood. Python 2.1 will hopefully support *explicit* source encodings, but 1.6/2.0 doesn't. ------------------------------------------------------- For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=112265&group_id=5470
UnicodeError: ASCII decoding error: ordinal not in range(128)
btw, what the heck is an "ordinal"? (let's see: it's probably not "a book of rites for the ordination of deacons, priests, and bishops". how about an "ordinal number"? that is, "a number designating the place (as first, second, or third) occupied by an item in an ordered sequence". hmm. does this mean that I cannot use strings longer than 128 characters? but this string was only 12 characters long. wait, there's another definition here: "a number assigned to an ordered set that de- signates both the order of its elements and its cardinal number". hmm. what's a "cardinal"? "a high ecclesiastical official of the Roman Catholic Church who ranks next below the pope and is appointed by him to assist him as a member of the college of cardinals"? ... oh, here it is: "a number (as 1, 5, 15) that is used in simple counting and that indicates how many elements there are in an assemblage". "assemblage"?) ::: wouldn't "character" be easier to grok for mere mortals? ...and isn't "range(128)" overly cute? ::: how about: UnicodeError: ASCII decoding error: character not in range 0-127 </F>
UnicodeError: ASCII decoding error: ordinal not in range(128)
btw, what the heck is an "ordinal"?
It's a technical term <wink>. But it's used consistently in Python, e.g., that's where the name of the builtin ord function comes from!
print ord.__doc__ ord(c) -> integer
Return the integer ordinal of a one character string.
... how about an "ordinal number"? that is, "a number designating the place (as first, second, or third) occupied by an item in an ordered sequence".
Exactly. Each character has an arbitrary but fixed position in an arbitrary but ordered sequence of all characters. This isn't hard.
wouldn't "character" be easier to grok for mere mortals?
Doubt it -- they're already confused about the need to distinguish between a character and its encoding, and the *character* is most certainly not "in" or "out" of any range of integers.
...and isn't "range(128)" overly cute?
Yes.
UnicodeError: ASCII decoding error: character not in range 0-127
As above, it makes no sense. How about compromising on
UnicodeError: ASCII decoding error: ord(character) > 127
?
participants (2)
-
Fredrik Lundh -
Tim Peters