Would anyone show me how to use htmllib?

Paul Wright -$Paul$- at verence.demon.co.uk
Sat Oct 28 10:24:24 EDT 2000


In article <8te2ch$8ou$1 at nnrp1.deja.com>,  <jackxh at my-deja.com> wrote:
>Hi
>I have read the python library reference. I am a python newbe, I think I
>have to overload some functions to get it working. Could anyone give to
>a example to show me how it works?

I've played with this a bit. Here's an example which prints out the
attributes associated with all the anchor tags in the file you give it
on the command line. If you just used the href attribute, you could
modify it to print a list of all the links in the document.

The way to use it is to define start_foo and end_foo for a HTML tag
which works like <FOO>Inside a foo</FOO> and do_bar for a HTML tag
which doesn't have openning and closing tags (like IMG for example). The
start_foo method is called when the start of the tag is seen, the
end_foo when the closing tag is seen. The attributes argument is a list
of 2-tuples (or pairs, if you like) giving the name and value of the
attributes for the tag, as you can see from the print_attributes
function below.

import htmllib
import formatter
import sys

def print_attributes (attributes):
        for pair in attributes:
                print pair [0], "=", pair [1]

class AnchorFinder (htmllib.HTMLParser):

        def __init__ (self):
                htmllib.HTMLParser.__init__ (self, formatter.NullFormatter ())
                # You could do other stuff here to set up your subclass

        def start_a (self, attributes):
                print_attributes (attributes)

                        
if __name__ == "__main__":
        parser = AnchorFinder ()
        parser.feed (open (sys.argv [1]).read ())
        parser.close ()


        

-- 
----- Paul Wright ------| "Their little anoraks bobbed and danced, their
-paul.wright at pobox.com--| cycling helmets swung with gay abandon - the NatSci
http://pobox.com/~pw201 | Elves were abroad!" -Simon Pick



More information about the Python-list mailing list