Html character entity conversion

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Sun Jul 30 16:53:51 EDT 2006


In <1154266972.154519.175040 at m73g2000cwd.googlegroups.com>,
pak.andrei at gmail.com wrote:

> Here is my script:
> 
> from mechanize import *
> from BeautifulSoup import *
> import StringIO
> b = Browser()
> f = b.open("http://www.translate.ru/text.asp?lang=ru")
> b.select_form(nr=0)
> b["source"] = "hello python"
> html = b.submit().get_data()
> soup = BeautifulSoup(html)
> print  soup.find("span", id = "r_text").string
> 
> OUTPUT:
> привет
> питон
> ----------
> In russian it looks like:
> "привет питон"
> 
> How can I translate this using standard Python libraries??

Have you tried a more recent version of BeautifulSoup?  IIRC current
versions always decode text to unicode objects before returning them.

Ciao,
	Marc 



More information about the Python-list mailing list