htmllib

Alex Martelli aleaxit at yahoo.com
Tue Oct 24 04:02:22 EDT 2000


"Hwanjo Yu" <hwanjoyu at uiuc.edu> wrote in message
news:dX7J5.4772$l12.75407 at vixen.cso.uiuc.edu...
> Please, can anyone show me an example code of parsing a html document
using
> htmllib to extract all the out-links of html document ?

import htmllib
import formatter

parser=htmllib.HTMLParser(formatter.NullFormatter())
parser.feed(open('myfile.html').read())
parser.close()

print parser.anchorlist


parser.anchorlist will be a list of strings, each string
being the URL in the href attribute of an <A> in the
myfile.html file.  I guess this is what you mean by
"extract all the out-links", right?


Alex






More information about the Python-list mailing list