HTMLparser

Oleg Broytmann phd at phd.russ.ru
Fri Jan 21 10:15:19 EST 2000


On Fri, 21 Jan 2000, Asle Pedersen wrote:

> I'm a beginner Python user and experimenting with the HTMLparser. I want to
> convert all relative urls to absolute urls without thouching the rest of the
> file containment.
> 
> this is what I have so far. but somehow it is throwing away all tags but the
> anchor tags. (which is not what I want)

   You should add unknown_starttag and unknown_endtag...

> class minparser(htmllib.HTMLParser):
> 
>     def __init__(self, formatter, verbose=0):
>  htmllib.HTMLParser.__init__(self, formatter, verbose)
> 
>     def anchor_bgn(self, href, name, type):
>         self.anchor = urlparse.urljoin("baseurl",href)
>         if self.anchor:
>      self.save_bgn()
> 
>     def anchor_end(self):
>         if self.anchor:
>      text = self.save_end()
>     #need to do something here
>      self.handle_data("%s <%s>"%(text,self.anchor))
>      self.anchor = None

Oleg.
---- 
    Oleg Broytmann      Foundation for Effective Policies      phd at phd.russ.ru
           Programmers don't die, they just GOSUB without RETURN.





More information about the Python-list mailing list