[Python-bugs-list] [Bug #112265] Impossible to get Win32 default font encoding in Tk widgets

Wed, 30 Aug 2000 12:07:10 -0700

Bug #112265, was updated on 2000-Aug-18 13:37
Here is a current snapshot of the bug.

Project: Python
Category: Tkinter
Status: Open
Resolution: Later
Bug Group: Platform-specific
Priority: 6
Summary: Impossible to get Win32 default font encoding in Tk widgets

Details: I did not managed to obtain correct font encoding in widgets on Win32 (NT Workstation, Polish version, default encoding cp1250). All cp1250 Polish characters were displayed incorrectly. I think, all characters that do not belong to Latin-1 will be displayed incorrectly. Regarding Python1.6b1, I checked the Tcl/Tk installation (8.3.2). The pure Tcl/Tk programs DO display characters in cp1250 correctly.
As far as I know, the Tcl interpreter woks with UTF-8 encoded strings. Does Python1.6b1 really know about it?

Follow-Ups:

Date: 2000-Aug-26 08:04
By: effbot

Comment:
this is really a "how do I", rather than a bug
report ;-)

:::

In 1.6 and beyond, Python's default 8-bit
encoding is plain ASCII.  this encoding is only
used when you're using 8-bit strings in "unicode
contexts" -- for example, if you compare an
8-bit string to a unicode string, or pass it to
a subsystem designed to use unicode strings.

If you pass an 8-bit string containing
characters outside the ASCII range to a function
expecting a unicode string, the result is
undefined (it's usually results in an exception,
but some subsystems may have other ideas).

Finally, Tkinter now supports Unicode.  In fact,
it assumes that all strings passed to it are
Unicode.  When using 8-bit strings, it's only
safe to use plain ASCII.

Tkinter currently doesn't raise exceptions for
8-bit strings with non-ASCII characters, but it
probably should.  Otherwise, Tk will attempt to
parse the string as an UTF-8 string, and if that
fails, it assumes ISO-8859-1.

:::

Anyway, to write portable code using characters
outside the ASCII character set, you should use
unicode strings.

in your case, you can use:

   s = unicode("<a cp1250 string>", "cp1250")

to get the platform's default encoding, you can do:

   import locale
   language, encoding = locale.getdefaultlocale()

where encoding should be cp1250 on your box.

:::

The reason this work under Tcl/Tk is that Tcl
assumes that your source code uses the
platform's default encoding, and converts things
to Unicode (not necessarily UTF-8) for you under
the hood.  Python 2.1 will hopefully support
*explicit* source encodings, but 1.6/2.0
doesn't.

-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=112265&group_id=5470