Help extracting info from HTML source ..

Nikita the Spider NikitaTheSpider at
Fri Jan 26 19:59:46 CET 2007

In article <1169819118.201093.267320 at>,
 "Miki" <miki.tebeka at> wrote:

> Hello Shelton,
> >   I am learning Python, and have never worked with HTML.  However, I would
> > like to write a simple script to audit my 100+ Netware servers via their web
> > portal.
> Always use the right tool, BeautilfulSoup
> ( is best for web
> scraping (IMO).
> from urllib import urlopen
> from BeautifulSoup import BeautifulSoup
> html = urlopen("").read()
> soup = BeautifulSoup(html)
> for link in soup("a"):
> 	print link["href"], "-->", link.contents

Agreed. HTML scraping is really complicated once you get into it. It 
might be interesting to write such a library just for your own 
satisfaction, but if you want to get something done then use a module 
that already written, like BeautifulSoup. Another module that will do 
the same job but works differently (and more simply, IMO) is HTMLData by 
Connelly Barnes:

Whole-site HTML validation, link checking and more

More information about the Python-list mailing list