HTMLparser
Oleg Broytmann
phd at phd.russ.ru
Fri Jan 21 10:15:19 EST 2000
On Fri, 21 Jan 2000, Asle Pedersen wrote:
> I'm a beginner Python user and experimenting with the HTMLparser. I want to
> convert all relative urls to absolute urls without thouching the rest of the
> file containment.
>
> this is what I have so far. but somehow it is throwing away all tags but the
> anchor tags. (which is not what I want)
You should add unknown_starttag and unknown_endtag...
> class minparser(htmllib.HTMLParser):
>
> def __init__(self, formatter, verbose=0):
> htmllib.HTMLParser.__init__(self, formatter, verbose)
>
> def anchor_bgn(self, href, name, type):
> self.anchor = urlparse.urljoin("baseurl",href)
> if self.anchor:
> self.save_bgn()
>
> def anchor_end(self):
> if self.anchor:
> text = self.save_end()
> #need to do something here
> self.handle_data("%s <%s>"%(text,self.anchor))
> self.anchor = None
Oleg.
----
Oleg Broytmann Foundation for Effective Policies phd at phd.russ.ru
Programmers don't die, they just GOSUB without RETURN.
More information about the Python-list
mailing list