improved hyperlink parser in python?
eugene kim
eugene1977 at hotmail.com
Sun Oct 13 21:47:55 EDT 2002
hi this parser gives me relative url.. such as
comment.php?newsid=339
instead of http://some.host.com/comment.php?newsid=339
also there javascript lines pop up...
javascript:SendEntry('http://www.smalla.de/movable_type/mt-popupemail.cgi?entry_id=1020')
javascript://
javascript://
import formatter
import htmllib
import urllib
f = formatter.NullFormatter()
file = urllib.urlopen("http://www.google.com")
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
file.close()
for link in p.anchorlist:
print link
is there a better parser module or written code?
i'll be very much helpful
thank you
More information about the Python-list
mailing list