start_h1(..) and end_h1(..) functinos in htmllib
Fredrik Lundh
fredrik at effbot.org
Sun Nov 19 11:35:40 EST 2000
hwan-jo yu wrote:
> I would like to extract the text only on headtags in a html document.
> I tried the below codes, but it doesn't work.
this works for me:
import htmllib, formatter
class myparser(htmllib.HTMLParser):
def start_h1(self, attrs):
self.save_bgn()
def end_h1(self):
print "H1", repr(self.save_end())
f = formatter.NullFormatter()
p = myparser(f)
p.feed("<html><h1>title</h1></html>")
p.close()
## prints
## H1 'title'
</F>
More information about the Python-list
mailing list