Html character entity conversion

pak.andrei at gmail.com pak.andrei at gmail.com
Sun Jul 30 14:52:46 EDT 2006


danielx wrote:
> pak.andrei at gmail.com wrote:
> > Here is my script:
> >
> > from mechanize import *
> > from BeautifulSoup import *
> > import StringIO
> > b = Browser()
> > f = b.open("http://www.translate.ru/text.asp?lang=ru")
> > b.select_form(nr=0)
> > b["source"] = "hello python"
> > html = b.submit().get_data()
> > soup = BeautifulSoup(html)
> > print  soup.find("span", id = "r_text").string
> >
> > OUTPUT:
> > привет
> > питон
> > ----------
> > In russian it looks like:
> > "привет питон"
> >
> > How can I translate this using standard Python libraries??
> >
> > --
> > Pak Andrei, http://paxoblog.blogspot.com, icq://97449800
>
> I'm having trouble understanding how your script works (what would a
> "BeautifulSoup" function do?), but assuming your intent is to find
> character reference objects in an html document, you might try using
> the HTMLParser class in the HTMLParser module. This class delegates
> several methods. One of them is handle_charref. It will be called with
> one argument, the name of the reference, which includes only the number
> part. HTMLParser is alot more powerful than that though. There may be
> something more light-weight out there that will accomplish what you
> want. Then again, you might be able to find a use for all that power :P.

Thank you for response.
It doesn't matter what is 'BeautifulSoup'...
General question is:

How can I convert encoded string

sEncodedHtmlText = 'привет
питон'

into human readable:

sDecodedHtmlText  == 'привет питон'




More information about the Python-list mailing list