how to get rid of html tags
R.Marquez
ny_r_marquez at yahoo.com
Thu Oct 3 13:52:39 EDT 2002
"koko" <kokohh at hotmail.com> wrote in message news:<AIMm9.1440$XX3.895043 at newssrv26.news.prodigy.com>...
> I am trying to retrieve a web page.
> But I only want to keep the content of the webpage without the html tags.
> How can I parse the webpage to get rid of the tags?
The WeaselWeb program has a Python module called htm2txt.py. Maybe it
can be useful to you.
To test it simply type at the command line:
Python htm2txt.py "Some Page.htm"
The module WeaselWeb.py has a couple of very simple methods of
downloading the page (on with ie+com and the other with urllib,
urlparse).
Download the source versions of WeaselWeb to get at them.
http://sourceforge.net/project/showfiles.php?group_id=9595&release_id=105094
(But, if you have a Palm Pilot you may enjoy the binary one ;).
-Ruben
More information about the Python-list
mailing list