tkinter + unicode + bug or feature??
Bob van der Poel
bvdpoel at kootenay.com
Sat Jan 25 04:54:16 CET 2003
Martin v. Löwis wrote:
> Bob van der Poel wrote:
>> BTW, I really think this is a bug. If you enter "ascii" text into the
>> entry box you get() returns a string, if you enter "extended ascii"
>> you get a unicode string. And since one can't tell beforehand what the
>> user is going to enter... Add to this the fact that the behaviour is
>> not documented in the tkinter reference manual (yes, it is the tcl/tk
> So what do you think the correct behaviour should be?
Well, since I work mostly in plain ascii, not unicode, I would think the
correct behaviour would be to return a regular string <type 'str'>. I
think this is what it was in tcl/tk pre-8.1. And if the community wanted
to have unicode, that would be fine as well. But, the way it is now one
never knows if one is going to get a <type 'unicode'> or a 'str'. And
that isn't right, is it?
>> Well, yes. Being on the US-side (altho I do live in Canada and we're a
>> bit less centric in our thinking) I was just referring to a "normal"
>> encoding...whatever that is :)
> There is no such thing.
Yes, as I would have figured if I'd given it any thought :)
>> Yes, local.getlocale() works fine. Now, if I do use encode on these
>> strings, will I run into problems if the user's locale is not
>> encodable into 8bits. Or can that not happen?
> Depends on what you mean by "8bits". You might have meant to ask
> Q. Could it happen that the user enters characters that cannot be
> represented in the 'normal encoding'?
> A. Yes, this can happen. If you merely want to compare this to another
> byte string, you should decode that byte string to Unicode, and perform
> the comparison then.
> or you meant to ask
> Q. Could it happen that the encoding produces more than one byte per
> A. Yes, this can happen, but it is no problem.
> or you meant to ask
> Q. Will Python support 'normal encodings' that produce more than one
> byte per character out of the box?
> A. No, Python does not ship with any such codecs (*). You should install
> the JapaneseCodecs, KoreanCodecs, or ChineseCodecs package for that.
What I think I really meant to ask is:
If my program takes strings entered by a user in a Entry() widget and I
take that data, convert it from a possible unicode string to the user's
current locale, will the result always be a regular string? Really, what
I'm trying to do is to avoid having my program crash when I do something
if a == somestring:
Current, 'somestring' IS a regular string. And if 'a' is a unicode the
program aborts. So, I'm planning on replacing get() with myget() which
will just do:
Seems to be a bit of a waste to encode each and every get(), but it is
probably just as fast to encode as it is to test to see if it is a str.
And we're sure there isn't a tcl/tk setting to take of this???
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: bvdpoel at kootenay.com
More information about the Python-list