[Tutor] Help with Parsing HTML files
Charlie Clark
Charlie Clark <charlie@begeistert.org>
Fri, 03 Aug 2001 14:24:02 +0200
>I had a look at the archives, I don't see how to search it,
>only to browse a month of postings at a time.
I couldn't find a search button either so I downloaded the last three months
and searched them in my editor. It was in the merry month of may that this
was last up for discussion and Remco posted the following snippet:
import htmllib
class ImgFinder(htmllib.HTMLParser):
def __init__(self):
# Normal HTMLParser takes a 'formatter' argument but we don't need it
htmllib.HTMLParser.__init__(self, None)
def handle_image(self, source, alt, *args):
# *args holds "ismap", "align", "width" and "height", if available,
# but we ignore those here
print "Found an image!"
print "Source =", source, "Alt =", alt
finder = ImgFinder()
finder.feed(aa) # Feed in the string, it should find the images
Now I need an idiot proof user guide for this. We create an instance of
ImgFinder which is based on htmllib.HTMLParser. When we construct an instance
of this class we format to None. We add a method "handle_image" which has
some nice arguments and prints "Found an image!" together with the image
source and alt tag. "finder" is an instance of our class and gets fed a
string - I assume this would be the contents of an HTML file.
What I don't see is how the handle_image function/method looks for images and
I need to learn how to use this in order to modify it for my own dark
purposes! Please help.
Charlie