Subclass SGMLParser or HTMLParser?
skpeternospam at ucdavis.edu
Mon Sep 30 06:19:39 CEST 2002
Hello, I've just started doing some python programming with
htmllib.HTMLParser to spider a website of mine and grab all of the
images and download them to disk, as well as collecting reference
counts for my hyperlinks. It works pretty well, except on a few web
pages that were generated with Word and most of these pages don't
contain images or anchor tags and I imagine the HTMLParser module
meant for XHTML documents will handle those just find once I get
around to playing with it.
My questions is, after having looked around on the web for examples,
I've noticed that most people seem to use sgmllib.SGMLParser instead.
I know that htmllib.HTMLParser is just a subclass of SGMLParser,
therefore I was wondering what the pros and cons are to using one or
the other. Any recommendations? Thanks in advance.
s/nospam/son/ -- to email me.
More information about the Python-list