understanding htmllib

David Bear david.bear at asu.edu
Tue Oct 3 23:38:54 EDT 2006


I'm trying to understand how to use the HTMLParser in htmllib but I'm not
seeing enough examples.

I just want to grab the contents of everything enclosed in a '<body>' tag,
i.e. items from where <body> begins to where </body> ends. I start by doing

class HTMLBody(HTMLParser):
   def __init__(self):
      self.contents = []

   def handle_starttag()..

Now I'm stuck. I cant see that there is a method on handle_starttag that
would return everthing to the end tag. And I haven't seen anything on how
to define my one handle_unknowntag..

Any pointers would be greatly appreciated. The documentation on this module
at python.org seems to assume a great deal about what the reader would
already know about which methods they should subclass.

-- 
David Bear
-- let me buy your intellectual property, I want to own your thoughts --



More information about the Python-list mailing list