Unicode and UrlEncode!

John johng2001 at rediffmail.com
Wed Mar 17 18:12:55 EST 2004


I am trying to translate some French via Google.

Here is my code.

# -*- coding: latin-1 -*-

import httplib, urllib, re

def translate_french_to_english(french):
    params = urllib.urlencode( {'text': french, 'langpair': 'fr|en',
'hl':'en', 'ie':'UTF8', 'oe':'UTF8'} )
    print params
    headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
    Cn = httplib.HTTPConnection("translate.google.com")
    Cn.request("POST", "/translate_t", params, headers)
    response = Cn.getresponse()
    data = response.read()
    Cn.close()
    match = re.compile('<textarea name=q
.*?>(.*?)</textarea>').search(data)
    translated = match.groups()[0]
    return translated

print translate_french_to_english('sÈdimentation')

The problem is my params in encoding to
langpair=fr%7Cen&text=s%C8dimentation&oe=UTF8&ie=UTF8&hl=en
-- s%C8dimentation --
when it should encode to
text=s%C3%88dimentation&langpair=fr%7Cen&hl=en&ie=UTF8&oe=UTF8
-- s%C3%88dimentation --

What should I do?



More information about the Python-list mailing list