[Tutor] Understanding what I'm doing: Strip HTML

Robin B. Lake rbl@hal.cwru.edu
Fri, 23 Jun 2000 18:21:31 -0400 (EDT)


I'm trying to strip the HTML from Web pages I'm downloading  ...
thousands of them.  Just right for Python!  Someone earlier provided
the solution:

import htmllib, formatter
p = htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter()))
f = open('yourfile.html')
p.feed(f.read())

Questions:

In read the documentation, it seems that the htmllib.HTMLParser line
just (a)  ties the formatter to DumbWriter and (b) assigns p to
be the HTMLParser, so that p.feed runs whatever data stream  it
gets through the HTMLParser.

What type is p?  Am I correct about how this works?

Thanks,
Rob Lake
rbl@hal.cwru.edu