[Tutor] i18n on Entry widgets

Kent Johnson kent37 at tds.net
Thu Aug 18 13:33:14 CEST 2005

I got it working with a utf-8 query by adding an Accept-Charset header to the request. I used the 'Tamper Data' add-on to Firefox to view all the request headers being sent by the browser. I added all the same headers to the Python request and it worked. Then I took out the headers until I found the needed one. Here is a stripped-down version of your code that posts a word encoded in utf-8 and gets the correct response. I also changed the post parameters a little to match what I am seeing in my browser:

import re, urllib, urllib2

__where = [ re.compile(r'name=\"q\">([^<]*)'),
           re.compile(r'td bgcolor=white>([^<]*)'),
           re.compile(r'td bgcolor=white class=s><div style=padding:10px;>([^<]*)'),

phrase = 'ent\xc3\xa3o'
params = urllib.urlencode( { 'doit' : 'done',
                            'tt' : 'urltext',
                            'trtext' : phrase,
                            'intl' : 1,
                            'lp' : 'pt_en' } )
print "URL encoding ", params

req = urllib2.Request('http://world.altavista.com/babelfish/tr')

req.add_header('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7')

response = urllib2.urlopen(req, params)

html = response.read()
for regex in __where:
   match = regex.search(html)
   if match: 
    print match.group(1)
    print "ERROR MATCHING"
    print html


Kent Johnson wrote:
> OK this is actually starting to make sense :-) Here is what I think is happening:
> You get different results in the IDE and the console because they are using different encodings. The IDE is using utf-8 so the params are encoded in utf-8. The console is using latin-1 and you get encoded latin-1 params.
> When you use babelfish from the browser it gets a page in utf-8 and sends the parameters back the same way, but probably with a header saying it is utf-8. When you use urllib you don't tell it the encoding so it is assuming latin-1, that's why the interpreter version works.
> So in your GUI version if you get utf-8 from the GUI, you can convert it to latin-1 by
> phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed in latin-1. If you need utf-8 then you have to figure out how to tell babelfish that you are sending utf-8.
> Kent
> PS please reply to the list not to me personally.
