HTMLparser

Asle Pedersen apederse at siving.hia.no
Fri Jan 21 08:02:13 EST 2000


I'm a beginner Python user and experimenting with the HTMLparser. I want to
convert all relative urls to absolute urls without thouching the rest of the
file containment.

this is what I have so far. but somehow it is throwing away all tags but the
anchor tags. (which is not what I want)

class minparser(htmllib.HTMLParser):

    def __init__(self, formatter, verbose=0):
 htmllib.HTMLParser.__init__(self, formatter, verbose)

    def anchor_bgn(self, href, name, type):
        self.anchor = urlparse.urljoin("baseurl",href)
        if self.anchor:
     self.save_bgn()

    def anchor_end(self):
        if self.anchor:
     text = self.save_end()
    #need to do something here
     self.handle_data("%s <%s>"%(text,self.anchor))
     self.anchor = None


-Asle





More information about the Python-list mailing list