[Tutor] HTMLParser problem unable to find all the IMG tags....
mlist-python at dideas.com
Thu Oct 28 14:34:37 CEST 2004
I'm trying to write a program that will locate the front page image at
CNN.com. [If this already exist, I want to do this anyway as its a good
The problem is that using the HTMLParser I'm not getting all the IMG
tags. I know this as I have another program that just uses string
processing that gets 2.5 times more IMG SRC tag. I also know this because
HTMLParser starttag is never called with the IMG that I'm after!
There is also an exception related to the close method and EOF.
Possibly my problem is in how I feed the data? Or related to nested tags?
from HTMLParser import HTMLParser
def __init__( self ) :
self.cnt = 0
def handle_starttag(self, tag, attr):
# print "Encountered the beginning of a %s tag" % tag
if (tag in "IMG" or tag in "img") :
self.cnt = self.cnt + 1
def close(self) :
print "HTMLParse Found : ", self.cnt
mp = MyParser()
for line in urllib2.urlopen('http://www.cnn.com') :
More information about the Tutor