[Tutor] i18n on Entry widgets

Wed Aug 17 21:41:50 CEST 2005

Jorge Louis de Castro wrote:
> Hi, thanks for the reply.
> 
> 
> However, I get strange behavior when I try to feed text that must be 
> unicode to altavista for translation.
> Just before sending, I've got the following on the log using
> 
> print "RECV DATA: ", repr(data)
> 
> and after entering "então" ("so" in Portuguese)
> 
> RECV DATA:  'right: ent\xc3\xa3o?'

OK the data from Tkinter seems to be in utf-8 already; it is not a unicode string (no u' in the repr) and \xc3\xa3 is the utf-8 representation of a-tilde.

> Now right before sending the data to be translated by altavista I print 
> out from the CONTENT[1] which yields:
> 
> Translating:   ent&#9500;úo?

You have done an HTML entity escape on the data somewhere maybe? I don't know where this might be coming from, it's pretty mangled. There must be another text transformation in there somewhere.
> 
> Which I find odd. Obvisouly, feeding this into babelfish results in a 
> failed translation. So before sending I try to encode it like you suggest.
> 
> try:
>  print "Translating: ", content[1]
>  decoded = content[1].encode('utf8')
>  print "Decoding Prior to Translating: ", decoded
> except Exception, e:
>  print "EXCEPTION ENCODING ", e
> 
> The Exception thrown is:
> 
> EXCEPTION ENCODING  'ascii' codec can't decode byte 0xc3 in position 4: 
> ordinal
> not in range(128)
> 
> 
> I was dealing w/ a Ascii string and was asking it to be encoded in UTF, 
> whereas Python is telling me he can't encode it in UTF?? Makes little 
> sense to me.

This is a confusing error. What happens is, if you have a non-unicode string and you try to encode it, Python first converts it to a unicode string using the default codec which is ascii. This conversion fails because the string has non-ascii characters in it.

Since you already have utf-8 this step is not needed.

Kent