HTML Parser - beginner needs help

Bjorn Pettersen bjorn at roguewave.com
Thu Sep 14 21:05:37 EDT 2000


In this case the HTMLParser module contains a handle_image method that
does exactly what you want (see below). As Frederik points out though,
it is in general easier to use sgmllib for extracting tags...

-- bjorn

import htmllib, formatter, urllib

class IMGParser(htmllib.HTMLParser):
	def __init__(self):
		htmllib.HTMLParser.__init__(self, formatter.NullFormatter())
		self.images = []
	def handle_image(self, src, alt, *args):
		self.images.append(src)

parser = IMGParser()
parser.feed(urllib.urlopen("http://www.python.org").read())
parser.close()
print parser.images


zet wrote:
> 
> Can somebody provide small piece of code, which returns list of  img tags?
> I've trying this lines:
> 
> class IMGParser(HTMLParser):
>  def end_img(arg):
>   return
> 
> but it return only an anchors, how to get IMG's?
> 
> --
> http://www.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list