how to get text from a html file?

rake joshuaamayer at gmail.com
Tue Apr 13 20:45:09 EDT 2010


On Apr 13, 2:12 pm, Chris Colbert <sccolb... at gmail.com> wrote:
> On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <varnika... at gmail.com> wrote:
>
> > Hi,
> > Can anyone tell me how to get text from a html file?I am trying to display
> > the text of an html file in textview(of glade).If i directly display the
> > file,it shows with html tags and attributes, etc. in textview.I don't want
> > that.I just want the text.
> > Can someone help me with this?
>
> > Regards
> > Varnika Tewari
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> You should look into beautiful soup
>
> http://www.crummy.com/software/BeautifulSoup/

For more complex parsing beautiful soup is definitely the way to go.

However, if all you want to do is strip the html and keep all
remaining text I'd recommend pyparsing package with this short script:

http://pyparsing.wikispaces.com/file/view/htmlStripper.py



More information about the Python-list mailing list