Help with parsing web page

Miki Tebeka miki.tebeka at
Tue Jun 15 12:18:17 CEST 2004

Hello RiGGa,

> Anyone?, I have found out I can use sgmllib but find the documentation is
> not that clear, if anyone knows of a tutorial or howto it would be
> appreciated.
I'm not an expert but this is how I work:

You make a subclass of HTMLParser and override the callback functions.
Usually I use only start_<TAB> end_<TAB> and handle_data.
Since you don't know *when* each callback function is called you need to
keep an internal state. It can be a simple variable or a stack if you
want to deal with nested tags.

A short example:
#!/usr/bin/env python

from htmllib import HTMLParser
from formatter import NullFormatter

class TitleParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self, NullFormatter())
        self.state = "" = ""
    def start_title(self, attrs):
        self.state = "title" = ""

    def end_title(self):
        print "Title:",

    def handle_data(self, data):
        if self.state:
   += data

if __name__ == "__main__":
    from sys import argv

    parser = TitleParser()

Miki Tebeka <miki.tebeka at>
The only difference between children and adults is the price of the toys.

More information about the Python-list mailing list