start_h1(..) and end_h1(..) functinos in htmllib

Fredrik Lundh fredrik at effbot.org
Sun Nov 19 11:35:40 EST 2000


hwan-jo yu wrote:
> I would like to extract the text only on headtags in a html document.
> I tried the below codes, but it doesn't work.

this works for me:

import htmllib, formatter

class myparser(htmllib.HTMLParser):
    def start_h1(self, attrs):
        self.save_bgn()
    def end_h1(self):
        print "H1", repr(self.save_end())

f = formatter.NullFormatter()

p = myparser(f)
p.feed("<html><h1>title</h1></html>")
p.close()

## prints
## H1 'title'

</F>





More information about the Python-list mailing list