Html character entity conversion
pak.andrei at gmail.com
pak.andrei at gmail.com
Sun Jul 30 14:52:46 EDT 2006
danielx wrote:
> pak.andrei at gmail.com wrote:
> > Here is my script:
> >
> > from mechanize import *
> > from BeautifulSoup import *
> > import StringIO
> > b = Browser()
> > f = b.open("http://www.translate.ru/text.asp?lang=ru")
> > b.select_form(nr=0)
> > b["source"] = "hello python"
> > html = b.submit().get_data()
> > soup = BeautifulSoup(html)
> > print soup.find("span", id = "r_text").string
> >
> > OUTPUT:
> > привет
> > питон
> > ----------
> > In russian it looks like:
> > "привет питон"
> >
> > How can I translate this using standard Python libraries??
> >
> > --
> > Pak Andrei, http://paxoblog.blogspot.com, icq://97449800
>
> I'm having trouble understanding how your script works (what would a
> "BeautifulSoup" function do?), but assuming your intent is to find
> character reference objects in an html document, you might try using
> the HTMLParser class in the HTMLParser module. This class delegates
> several methods. One of them is handle_charref. It will be called with
> one argument, the name of the reference, which includes only the number
> part. HTMLParser is alot more powerful than that though. There may be
> something more light-weight out there that will accomplish what you
> want. Then again, you might be able to find a use for all that power :P.
Thank you for response.
It doesn't matter what is 'BeautifulSoup'...
General question is:
How can I convert encoded string
sEncodedHtmlText = 'привет
питон'
into human readable:
sDecodedHtmlText == 'привет питон'
More information about the Python-list
mailing list