Html character entity conversion
Claudio Grondi
claudio.grondi at freenet.de
Sun Jul 30 11:30:46 EDT 2006
pak.andrei at gmail.com wrote:
> Here is my script:
>
> from mechanize import *
> from BeautifulSoup import *
> import StringIO
> b = Browser()
> f = b.open("http://www.translate.ru/text.asp?lang=ru")
> b.select_form(nr=0)
> b["source"] = "hello python"
> html = b.submit().get_data()
> soup = BeautifulSoup(html)
> print soup.find("span", id = "r_text").string
>
> OUTPUT:
> привет
> питон
> ----------
> In russian it looks like:
> "привет питон"
>
> How can I translate this using standard Python libraries??
>
> --
> Pak Andrei, http://paxoblog.blogspot.com, icq://97449800
>
Translate to what and with what purpose?
Assuming your intention is to get a Python Unicode string, what about:
strHTML = 'привет
питон'
strUnicodeHexCode = strHTML.replace('&#','\u').replace(';','')
strUnicode = eval("u'%s'"%strUnicodeHexCode)
?
I am sure, there is a more elegant and direct solution, but just wanted
to provide here some quick response.
Claudio Grondi
More information about the Python-list
mailing list