Html character entity conversion
yichunwe at usc.edu
Sun Sep 10 02:58:47 CEST 2006
pak.andrei at gmail.com wrote:
> danielx wrote:
>> pak.andrei at gmail.com wrote:
>>> Here is my script:
>>> from mechanize import *
>>> from BeautifulSoup import *
>>> import StringIO
>>> b = Browser()
>>> f = b.open("http://www.translate.ru/text.asp?lang=ru")
>>> b["source"] = "hello python"
>>> html = b.submit().get_data()
>>> soup = BeautifulSoup(html)
>>> print soup.find("span", id = "r_text").string
>>> In russian it looks like:
>>> "привет питон"
>>> How can I translate this using standard Python libraries??
> Thank you for response.
> It doesn't matter what is 'BeautifulSoup'...
However, the best solution is to ask BeautifulSoup to do that for you.
if you do
soup = BeautifulSoup(your_html_page, convertEntities="html")
you should not be worrying about the problem you had. this converts all
the html entities (the five you see as soup.entitydefs) and all the
"&#xxx;" stuff to their python unicode string.
> General question is:
> How can I convert encoded string
> sEncodedHtmlText = 'привет
> into human readable:
> sDecodedHtmlText == 'привет питон'
More information about the Python-list