start_h1(..) and end_h1(..) functinos in htmllib

Fredrik Lundh fredrik at
Sun Nov 19 17:35:40 CET 2000

hwan-jo yu wrote:
> I would like to extract the text only on headtags in a html document.
> I tried the below codes, but it doesn't work.

this works for me:

import htmllib, formatter

class myparser(htmllib.HTMLParser):
    def start_h1(self, attrs):
    def end_h1(self):
        print "H1", repr(self.save_end())

f = formatter.NullFormatter()

p = myparser(f)

## prints
## H1 'title'


More information about the Python-list mailing list