![](https://secure.gravatar.com/avatar/cea26f21aea8bc278fb11fb9450982b4.jpg?s=120&d=mm&r=g)
Thanks to a bug report I got, I noticed for the first time that you cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell prompt, you may get
s='äö' UnicodeError: ASCII encoding error: ordinal not in range(128)
Likewise, when trying to save a file that has non-ASCII characters, you get a traceback. Now, I think I understand all the causes of the problem (Tkinter returning Unicode objects, and so on). However, I'm curious whether anybody has proposals on how to deal with it. For saving text files, if Python had an encoding directive, things might be easier :-) For the shell prompt, I've no idea how to solve this best. So any suggestions are welcome. Regards, Martin
![](https://secure.gravatar.com/avatar/12362ecee4672f1dd2d641ce5b4eca14.jpg?s=120&d=mm&r=g)
Martin von Loewis wrote:
Thanks to a bug report I got, I noticed for the first time that you cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell prompt, you may get
s='äö' UnicodeError: ASCII encoding error: ordinal not in range(128)
Likewise, when trying to save a file that has non-ASCII characters, you get a traceback.
Now, I think I understand all the causes of the problem (Tkinter returning Unicode objects, and so on). However, I'm curious whether anybody has proposals on how to deal with it.
For saving text files, if Python had an encoding directive, things might be easier :-) For the shell prompt, I've no idea how to solve this best.
So any suggestions are welcome.
I have a bug report assigned to myself which indicates similar problems with _tkinter and Tk/Tcl. There were other problem reports on the German Python mailing list going in the same direction too. The basic problem seems to be that Tk/Tcl applies too much magic to the text widget contents in order to find out the used encoding and this can easily cause the whole encoding mechanism to fail. A Tk/Tcl expert should really look into this and fix _tkinter.c to aid Tk/Tcl in not mixing up the encodings (e.g. it would probably be a good idea to recode Python 8bit-strings into whatever encoding Tk/Tcl assumes as default). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
![](https://secure.gravatar.com/avatar/cea26f21aea8bc278fb11fb9450982b4.jpg?s=120&d=mm&r=g)
I have a bug report assigned to myself which indicates similar problems with _tkinter and Tk/Tcl. There were other problem reports on the German Python mailing list going in the same direction too.
The basic problem seems to be that Tk/Tcl applies too much magic to the text widget contents in order to find out the used encoding and this can easily cause the whole encoding mechanism to fail.
This is actually a different problem. In this scenario here, the user types non-ASCII character into a text widget, then _tkinter returns a Unicode object (IMO rightfully so). In the other problem, the Python program puts a byte string into a text widget, the user enters some more characters, and _tkinter returns a byte string which does not follow any encoding.
A Tk/Tcl expert should really look into this and fix _tkinter.c to aid Tk/Tcl in not mixing up the encodings (e.g. it would probably be a good idea to recode Python 8bit-strings into whatever encoding Tk/Tcl assumes as default).
Again, this is not the issue here: Both _tkinter and Tk behave absolutely correct IMO. The question is how IDLE should deal with it. Regards, Martin
![](https://secure.gravatar.com/avatar/0b854fe5258eb97b09ae8f914ab9d598.jpg?s=120&d=mm&r=g)
Thanks to a bug report I got, I noticed for the first time that you cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell prompt, you may get
s='äö' UnicodeError: ASCII encoding error: ordinal not in range(128)
This doesn't bother me, because I don't know how to enter such characters with my US keyboard anyway. :-) :-)
Likewise, when trying to save a file that has non-ASCII characters, you get a traceback.
Yes, this has bitten me once. It was very painful (I lost a few hours worth of writing). In other words, I agree it's a problem!
Now, I think I understand all the causes of the problem (Tkinter returning Unicode objects, and so on). However, I'm curious whether anybody has proposals on how to deal with it.
Not me -- unfortunately, there are too many alternatives to IDLE to be able to justify working on it much.
For saving text files, if Python had an encoding directive, things might be easier :-) For the shell prompt, I've no idea how to solve this best.
So any suggestions are welcome.
Ditto. Postscript: using cut and paste, I *can* enter "s='äö'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed? --Guido van Rossum (home page: http://www.python.org/~guido/)
![](https://secure.gravatar.com/avatar/cea26f21aea8bc278fb11fb9450982b4.jpg?s=120&d=mm&r=g)
Postscript: using cut and paste, I *can* enter "s='äö'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed?
Perhaps the Tcl version? That sounds like the issue that Marc talked about: Tk behaves differently when text is entered programmatically (and perhaps through cut-n-paste), as compared to text entered through the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on Solaris 8 still gives me the UnicodeError. Regards, Martin
![](https://secure.gravatar.com/avatar/01984bb44ffb1652c708f3bae06f5185.jpg?s=120&d=mm&r=g)
[Guido]
Postscript: using cut and paste, I *can* enter "s='äö'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed?
[Martin]
Perhaps the Tcl version? That sounds like the issue that Marc talked about: Tk behaves differently when text is entered programmatically (and perhaps through cut-n-paste), as compared to text entered through the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on Solaris 8 still gives me the UnicodeError.
I don't know which version of Python Guido used. I tried cut-&-paste of s='äö' from his email into the distributed 2.1 IDLE under Win98, and got UnicodeError: ASCII encoding error: ordinal not in range(128) Tk appears to interfere with using the usual Windows ALT+0nnn method of entering funny characters, so unsure what happens then -- but for me it either works fine or does something insane (moves the cursor to the left margin, brings up an IDLE dialog box, etc). If I open the system Character Map utility and copy-&-paste using *that*, I can enter all sorts of stuff without problem:
s = "àáâãäåæçèéêëìíîïðñòòóôõö÷øùúûüýþÿ" s '\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef \xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
So not all clipboard entries are created equal. Another clue: if I paste the s='äö' snippet from Guido's email into a file opened with Notepad, then immediately copy it again from the Notepad doc, then paste that into Idle, again no problem:
s='äö' s '\xe4\xf6'
Using a clipboard diagnostic tool I don't understand, when I copy from Notepad these data formats are in the system clipboard: TEXT LOCALE OEMTEXT But when I copy from Guido's email under Outlook 2000, it's DataObject Rich Text Format Rich Text Format Without Objects RTF as Text TEXT UNICODTEXT Ole Private Data LOCALE OEMTEXT Under Character Map, it's Rich Text Format TEXT LOCALE OEMTEXT So perhaps it's not the version of Tk but the source of the data, and that Tk grabs an unfortunate data format (when present) from the clipboard in preference to a fortunate one. the-clipboard-is-a-complex-beast-ly y'rs - tim
participants (4)
-
Guido van Rossum
-
M.-A. Lemburg
-
Martin von Loewis
-
Tim Peters