[Tutor] i18n on Entry widgets
Jorge Louis de Castro
jobauk at hotmail.com
Wed Aug 17 21:15:33 CEST 2005
Hi, thanks for the reply.
However, I get strange behavior when I try to feed text that must be unicode
to altavista for translation.
Just before sending, I've got the following on the log using
print "RECV DATA: ", repr(data)
and after entering "então" ("so" in Portuguese)
RECV DATA: 'right: ent\xc3\xa3o?'
Sent Message to Client Nr. 1
CONTENT: ['right', ' ent\xc3\xa3o?']
Above before the CONTENT printout, there is a data.split(":")
Now right before sending the data to be translated by altavista I print out
from the CONTENT[1] which yields:
Translating: então?
Which I find odd. Obvisouly, feeding this into babelfish results in a failed
translation. So before sending I try to encode it like you suggest.
try:
print "Translating: ", content[1]
decoded = content[1].encode('utf8')
print "Decoding Prior to Translating: ", decoded
except Exception, e:
print "EXCEPTION ENCODING ", e
try:
translated = translate(decoded, src_l, dest_l)
except Exception, e:
print "EXCEPTION TRANSLATING ", e
translated = "translation failed"
The Exception thrown is:
EXCEPTION ENCODING 'ascii' codec can't decode byte 0xc3 in position 4:
ordinal
not in range(128)
I was dealing w/ a Ascii string and was asking it to be encoded in UTF,
whereas Python is telling me he can't encode it in UTF?? Makes little sense
to me.
Chrs
j.
>From: Kent Johnson <kent37 at tds.net>
>To: jorge at bcs.org.uk
>CC: tutor at python.org
>Subject: Re: [Tutor] i18n on Entry widgets
>Date: Wed, 17 Aug 2005 13:27:24 -0400
>
>Jorge Louis de Castro wrote:
>>Hi,
>>
>>How do I set the encoding of a string? I'm reading a string on a Entry
>>widget and it may use accents and other special characters from languages
>>other than English.
>>When I send the string read through a socket the socket is automatically
>>closed. Is there a way to encode any special characters on a string?
>
>First you have to know what the encoding is of the string you get from the
>Entry. IIRC a Tkinter widget will give you an ASCII string if possible,
>otherwise a Unicode string. You could check this by
> print repr(data)
>where data is the string you get from the Entry.
>
>Next you have to encode the unicode string to the encoding you want on the
>socket. If you want utf-8, you would use
> socket_data = data.encode('utf-8')
>This will work if data is ASCII or Unicode. There are many other supported
>encodings; see http://docs.python.org/lib/standard-encodings.html for a
>list.
>
>Kent
More information about the Tutor
mailing list