Parsing an HTML a tag
George
buffer_88 at hotmail.com
Sat Sep 24 20:16:46 EDT 2005
I'm very new to python and I have tried to read the tutorials but I am
unable to understand exactly how I must do this problem.
Specifically, the showIPnums function takes a URL as input, calls the
read_page(url) function to obtain the entire page for that URL, and
then lists, in sorted order, the IP addresses implied in the "<A
HREF=· · ·>" tags within that page.
"""
Module to print IP addresses of tags in web file containing HTML
>>> showIPnums('http://22c118.cs.uiowa.edu/uploads/easy.html')
['0.0.0.0', '128.255.44.134', '128.255.45.54']
>>> showIPnums('http://22c118.cs.uiowa.edu/uploads/pytorg.html')
['0.0.0.0', '128.255.135.49', '128.255.244.57', '128.255.30.11',
'128.255.34.132', '128.255.44.51', '128.255.45.53',
'128.255.45.54', '129.255.241.42', '64.202.167.129']
"""
def read_page(url):
import formatter
import htmllib
import urllib
htmlp = htmllib.HTMLParser(formatter.NullFormatter())
htmlp.feed(urllib.urlopen(url).read())
htmlp.close()
def showIPnums(URL):
page=read_page(URL)
if __name__ == '__main__':
import doctest, sys
doctest.testmod(sys.modules[__name__])
More information about the Python-list
mailing list