Babelfish translation ...

Stef Mientki stef.mientki at gmail.com
Thu Jul 17 23:55:06 CEST 2008


thanks Stefan,

both lxml and threading works perfect.
One small problem, "with_tail" was not recognized as a valid keyword.

cheers,
Stef

Stefan Behnel wrote:
> Stef Mientki <stef.mientki <at> gmail.com> writes:
>   
>> Although it works functionally,
>> it can take lots of time waiting for the translation.
>>
>> What I basically do is, after selecting a new string to be translated:
>>
>>     kwds = { 'trtext' : line_to_be_translated, 'lp' :'en_nl'}
>>     soup = BeautifulSoup (urlopen(url, urlencode ( kwds ) ) )
>>     translation= soup.find ( 'div', style='padding:0.6em;' ).string
>>     self.Editor_Babel.SetLabel ( translation )
>>     
>
> You should give lxml.html a try.
>
> http://codespeak.net/lxml/
>
> It can parse directly from HTTP URLs (no need to go through urlopen), and it 
> frees the GIL while parsing, so it will become efficient to create a little 
> Thread that doesn't do more than parsing the web site, as in (untested):
>
>   def read_bablefish(text, lang, result):
>       url = BABLEFISH_URL + '?' + urlencode({'trtext':text, 'lp':lang})
>       page = lxml.html.parse(url)
>       for div in page.iter('div'):
>            style = div.get('style')
>            if style is not None and 'padding:0.6em;' in style:
>                result.append(
>                   lxml.html.tostring(div, method="text", with_tail=False))
>
>   result = []
>   thread = threading.Thread(target=read_bablefish,
>                             args=("...", "en_nl", result))
>   thread.start()
>   while thread.isAlive():
>       # ... do other stuff
>   if result:
>       print result[0]
>
> Stefan
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>   




More information about the Python-list mailing list