This is spoiling Python image!

Roman Suzi rnd at onego.ru
Sun Jul 15 03:38:04 EDT 2001


The problem: it is impossible to use IDLE with non-latin1
encodings under Windows.

IDLE is standard IDE for Python and it is what beginner users of
Python see in their Start->Programs. Unfortunately, IDLE can't work
with non-latin1 characters any more. This could lead beginners to
reconsider their choice of language because of unfriendly i18n issues.

The problem is explained in detail below.

Lets consider all errors one at a time.

1. Tcl can't find encodings (they are in \Python21\tcl\tcl8.3\encoding\).
Without them it is impossible to enter cyrillic and other kinds of letters
in Text and Entry widgets under Windows.

Tkinter tries to help Tcl by means of FixTk.py:

import sys, os, _tkinter
ver = str(_tkinter.TCL_VERSION)
for t in "tcl", "tk":
    v = os.path.join(sys.prefix, "tcl", t+ver)
    if os.path.exists(os.path.join(v, "tclIndex")):
        os.environ[t.upper() + "_LIBRARY"] = v

This sets env. variables TCL_LIBRARY and TK_LIBRARY to
"C:\Python21\tcl\tcl8.3".

The problem is that it imports _tkinter which initialises
and calls Tcl_FindExecutable before TCL_LIBRARY is set.

It is easy to fix this error in FixTk.py:

import sys, os
if not os.environ.has_key('TCL_LIBRARY'):
	tcl_library = os.path.join(sys.prefix, "tcl", "tclX.Y")
	os.environ['TCL_LIBRARY'] = tcl_library

Tcl is smart enough to look into "C:\Python21\tcl\tclX.Y\..\tcl8.3"
as well.

2. Now we are able to print in IDLE:
>>> print "Привет"

and we  will see russian letter... before we press Enter,
after which:

UnicodeError: ASCII decoding error: ordinal not in range(128)

appears.

Tcl recoded "Привет" into Unicode. Python tries to recode it back
into usual string, assuming usual strings have sys.getdefaultencoding().

Now we need to set default encoding.
Lets look into site.py:

# Set the string encoding used by the Unicode implementation.  The
# default is 'ascii', but if you're willing to experiment, you can
# change this.

encoding = "ascii" # Default value set by _PyUnicode_Init()

if 0:
    # Enable to support locale aware default string encodings.
    import locale
    loc = locale.getdefaultlocale()
    if loc[1]:
        encoding = loc[1]

if 0:
    # Enable to switch off string to Unicode coercion and implicit
    # Unicode to string conversion.
    encoding = "undefined"

if encoding != "ascii":
    sys.setdefaultencoding(encoding)

The code for setting default encoding is commented (maybe, to allow faster
startup?)

Then goes:

#
# Run custom site specific code, if available.
#
try:
    import sitecustomize
except ImportError:
    pass

#
# Remove sys.setdefaultencoding() so that users cannot change the
# encoding after initialization.  The test for presence is needed when
# this module is run as a script, because this code is executed twice.
#
if hasattr(sys, "setdefaultencoding"):
    del sys.setdefaultencoding

So, sys.setdefaultencoding is deleted after we used it in
sitecustomize.py.

Its too bad, because the program can't set default encoding
and implicit string<->unicode conversions are very common in Python
and IDLE.

The solution could be as follows. Lets put sitecustomize.py in
C:\Python21\ with the following:

import locale, sys
encoding = locale.getdefaultlocale()[1]
if encoding:
	sys.setdefaultencoding(encoding)


* It would be wonderful if IDLE itself could setup encoding based
on locale or issued warnings and pointed t o solution somehow.

3. Now we can try it again in IDLE:

>>> print "Привет"

after hitting Enter we are getting... latin1.

It's time to look at how _tkinter.c communicates with Tcl.

The cheap&dirty solution for IDLE is as follows:

--- Percolator.py.orig	Sat Jul 14 19:38:16 2001
+++ Percolator.py	Sat Jul 14 19:38:16 2001
@@ -22,6 +22,8 @@

     def insert(self, index, chars, tags=None):
         # Could go away if inheriting from Delegator
+        if index != 'insert':
+        	chars = unicode(chars)
         self.top.insert(index, chars, tags)

     def delete(self, index1, index2=None):


--- PyShell.py.orig	Sat Jul 14 19:38:37 2001
+++ PyShell.py	Sat Jul 14 19:38:37 2001
@@ -469,6 +469,8 @@
         finally:
             self.reading = save
         line = self.text.get("iomark", "end-1c")
+        if type(line) == type(u""):
+        	line = line.encode()
         self.resetoutput()
         if self.canceled:
             self.canceled = 0

But alas these patches only mask the problem.

What is really needed?

Starting from version 8.1 Tcl is totally unicoded. It is very simple:
tt wants us utf-8 strings and returns also utf-8 strings.
(As an exception, Tcl could assume latin1 if it is unable to decode
string).

_tkinter.c just sends Python strings as is to Tcl.
And does it correctly for Unicode strings. Receiving side is
slightly more complicated:

Tkapp_Call function (aka root.tk.call) handles most of the Tkinter
Tcl/Tk commands. If the result is 7bit clean, Tkapp_Call returns usual
string, if not -- it converts from utf-8 into unicode and returns
Unicode string.

Only Tkapp_Call does it. All others (Tkapp_Eval, GetVar, PythonCmd)
return utf-8 string!

IDLE extensively use Tkinter capabilities and all kinds of strings
go back and forth between Python and Tcl.

Of course, _tkinter.c works incorrectly.

i) before sending a string to Tcl, it must recode it
FROM default encoding TO utf-8

ii) upon receive of a string from Tcl, it must recode it from
utf-8 to default encoding, if possible.
[R.S.: Or return it as Unicode, if impossible]

It is possible to optimize the conversions. Of course, this will have
impact on the speed of Tkinter. But in our opinion correct work is
more important than speed.

Solution checked under Win98.

>From R.S.: yes, IDLE is not ideal and there are better IDEs (Emacs,
for example) and "serious" programmers rarely use it. Also Tkinter is
critisized much, etc. But the problem indicated above is very bad for
Python image as a user-friendly language. That is why it is very
important to FIX the problem as soon, as possible.

We can prepare patches for _tkinter.c as well.

Before we proceed to submitting bug-reports and patches, we will be glad
to hear if somebody has better solution to the indicated problem.

(The big deal of the problem is the need to patch _tkinter.c and recompile
it. Everything else even beginner could fix if supplied with clues and
files with fixes. But of course, Python's IDLE must run correct out of the
box).


Author: Kirill Simonov <kirill(at)xyz.donetsk.ua>
Translator: Roman Suzi <rnd at onego.ru>





More information about the Python-list mailing list