Formatters, Writers, Parsers...
Kevin Carlson
nskhcarlso at bellsouth.net
Mon Jun 16 18:14:55 EDT 2003
I am trying to write a module that will convert HTML into formatted
text. I have seen numerous examples in the archives and in the Python
Cookbook about stripping out HTML tags which all seem to work fine.
However, what I need to do is format the text according to the tags that
the HTML contains. In particular, I need to increase the indentation
about 10 characters for each <td> tag that is read. I have been
approaching this as follows:
------- Begin code -----------
class TextParser(htmllib.HTMLParser) :
def __init__(self, fmtr, verbose=0) :
htmllib.HTMLParser.__init__(self, fmtr, verbose)
self.fmtr = fmtr
self.insideTR = 0
self.currentMargin = 0
def start_tr(self, attrs) :
self.insideTR = 1
def end_tr(self) :
self.insideTR = 0
self.fmtr.end_paragraph(1)
def start_td(self, attrs) :
self.currentMargin = self.currentMargin + 10
self.fmtr.push_margin(self.currentMargin)
def end_td(self) :
pass
def parseText(data) :
parser = TextParser(
formatter.AbstractFormatter(formatter.DumbWriter()))
parser.feed(data)
------- End code ---------
I guess I don't understand formatters because I am getting the exact
same results as if don't issue any of the calls to the formatter.
Can anyone shed some light on what I am doing wrong?
Thanks,
Kevin
More information about the Python-list
mailing list