HTML parsing: anyone use formatter?

[Crossposted to python-dev, web-sig, and xml-sig. Followups to web-sig@python.org, please.] I'm working on bringing htmllib.py up to HTML 4.01 by adding handlers for all the missing elements. I've currently been adding just empty methods to the HTMLParser class, but the existing methods actually help render the HTML by calling methods on a Formatter object. For example, the definitions for the H1 element look like this: def start_h1(self, attrs): self.formatter.end_paragraph(1) self.formatter.push_font(('h1', 0, 1, 0)) def end_h1(self): self.formatter.end_paragraph(1) self.formatter.pop_font() Question: should I continue supporting this in new methods? This can only go so far; a tag such as <big> or <small> is easy for me to handle, but handling <form> or <frameset> or <table> would require greatly expanding the Formatter class's repertoire. I suppose the more general question is, does anyone use Python's formatter module? Do we want to keep it around, or should htmllib be pushed toward doing just HTML parsing? formatter.py is a long way from being able to handle modern web pages and it would be a lot of work to build a decent renderer. --amk
participants (2)
-
amk@amk.ca
-
Jeremy Fincher