[Tutor] extract hosts from html write to file

Kent Johnson kent37 at tds.net
Wed Sep 12 00:29:43 CEST 2007


sacha rook wrote:
> Hi I wonder if anyone can help with the following
>  
> I am trying to read a html page extract only fully qualified hostnames 
> from the page and output these hostnames to a file on disk to be used 
> later as input to another program.

I would use BeautifulSoup to parse out the hrefs and urlparse.urlparse() 
to split the hostname out of the href.

http://www.crummy.com/software/BeautifulSoup/documentation.html

Kent


More information about the Tutor mailing list