[Tutor] i18n on Entry widgets
kent37 at tds.net
Wed Aug 17 21:41:50 CEST 2005
Jorge Louis de Castro wrote:
> Hi, thanks for the reply.
> However, I get strange behavior when I try to feed text that must be
> unicode to altavista for translation.
> Just before sending, I've got the following on the log using
> print "RECV DATA: ", repr(data)
> and after entering "então" ("so" in Portuguese)
> RECV DATA: 'right: ent\xc3\xa3o?'
OK the data from Tkinter seems to be in utf-8 already; it is not a unicode string (no u' in the repr) and \xc3\xa3 is the utf-8 representation of a-tilde.
> Now right before sending the data to be translated by altavista I print
> out from the CONTENT which yields:
> Translating: ent├úo?
You have done an HTML entity escape on the data somewhere maybe? I don't know where this might be coming from, it's pretty mangled. There must be another text transformation in there somewhere.
> Which I find odd. Obvisouly, feeding this into babelfish results in a
> failed translation. So before sending I try to encode it like you suggest.
> print "Translating: ", content
> decoded = content.encode('utf8')
> print "Decoding Prior to Translating: ", decoded
> except Exception, e:
> print "EXCEPTION ENCODING ", e
> The Exception thrown is:
> EXCEPTION ENCODING 'ascii' codec can't decode byte 0xc3 in position 4:
> not in range(128)
> I was dealing w/ a Ascii string and was asking it to be encoded in UTF,
> whereas Python is telling me he can't encode it in UTF?? Makes little
> sense to me.
This is a confusing error. What happens is, if you have a non-unicode string and you try to encode it, Python first converts it to a unicode string using the default codec which is ascii. This conversion fails because the string has non-ascii characters in it.
Since you already have utf-8 this step is not needed.
More information about the Tutor