[Tutor] extract hosts from html write to file

sacha rook sacharook at hotmail.co.uk
Tue Sep 11 18:18:18 CEST 2007


Hi I wonder if anyone can help with the following
 
I am trying to read a html page extract only fully qualified hostnames from the page and output these hostnames to a file on disk to be used later as input to another program.
 
I have this so far
 
import urllib2f=open("c:/tmp/newfile.txt", "w")for line in urllib2.urlopen("http://www.somedomain.uk"):    if "href" in line and "http://" in line:        print line        f.write(line)f.close()fu=open("c:/tmp/newfile.txt", "r")    for line in fu.readlines():    print line       
 
so i have opened a file to write to, got a page of html, printed and written those to file that contain href & http:// references.
closed file opened file read all the lines from file and printed out
 
Can someone point me in right direction please on the flow of this program, the best way to just extract the hostnames and print these to file on disk?
 
As you can see I am newish to this
 
Thanks in advance for any help given!
 
s
_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070911/ebd60715/attachment.htm 


More information about the Tutor mailing list