improved hyperlink parser in python?

eugene kim eugene1977 at hotmail.com
Sun Oct 13 21:47:55 EDT 2002


hi this parser gives me relative url.. such as
comment.php?newsid=339
instead of http://some.host.com/comment.php?newsid=339
also there javascript lines pop up...
javascript:SendEntry('http://www.smalla.de/movable_type/mt-popupemail.cgi?entry_id=1020')
javascript://
javascript://

import formatter
import htmllib
import urllib

f = formatter.NullFormatter()
file = urllib.urlopen("http://www.google.com")
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
file.close()

for link in p.anchorlist:
        print link

is there a better parser module or written code?
i'll be very much helpful

thank you



More information about the Python-list mailing list