[Tutor] extract hosts from html write to file
sacha rook
sacharook at hotmail.co.uk
Tue Sep 11 18:18:18 CEST 2007
Hi I wonder if anyone can help with the following
I am trying to read a html page extract only fully qualified hostnames from the page and output these hostnames to a file on disk to be used later as input to another program.
I have this so far
import urllib2f=open("c:/tmp/newfile.txt", "w")for line in urllib2.urlopen("http://www.somedomain.uk"): if "href" in line and "http://" in line: print line f.write(line)f.close()fu=open("c:/tmp/newfile.txt", "r") for line in fu.readlines(): print line
so i have opened a file to write to, got a page of html, printed and written those to file that contain href & http:// references.
closed file opened file read all the lines from file and printed out
Can someone point me in right direction please on the flow of this program, the best way to just extract the hostnames and print these to file on disk?
As you can see I am newish to this
Thanks in advance for any help given!
s
_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070911/ebd60715/attachment.htm
More information about the Tutor
mailing list