[Tutor] Understanding what I'm doing: Strip HTML
Robin B. Lake
rbl@hal.cwru.edu
Fri, 23 Jun 2000 18:21:31 -0400 (EDT)
I'm trying to strip the HTML from Web pages I'm downloading ...
thousands of them. Just right for Python! Someone earlier provided
the solution:
import htmllib, formatter
p = htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter()))
f = open('yourfile.html')
p.feed(f.read())
Questions:
In read the documentation, it seems that the htmllib.HTMLParser line
just (a) ties the formatter to DumbWriter and (b) assigns p to
be the HTMLParser, so that p.feed runs whatever data stream it
gets through the HTMLParser.
What type is p? Am I correct about how this works?
Thanks,
Rob Lake
rbl@hal.cwru.edu