Looking for a decent HTML parser for Python...
Just Another Victim of the Ambient Morality
ihatespam at hotmail.com
Tue Dec 5 23:10:14 EST 2006
"Just Another Victim of the Ambient Morality" <ihatespam at hotmail.com> wrote
in message news:qKqdh.303031$tl2.45967 at fe10.news.easynews.com...
> I'm trying to parse HTML in a very generic way.
> So far, I'm using SGMLParser in the sgmllib module. The problem is
> that it forces you to parse very specific tags through object methods like
> start_a(), start_p() and the like, forcing you to know exactly which tags
> you want to handle. I want to be able to handle the start tags of any and
> all tags, like how one would do in the Xerces C++ XML parser. In other
> words, I would like a simple start() method that is called whenever any
> tag is encountered. How may I do this?
> Thank you...
Okay, I think I found what I'm looking for in HTMLParser in the
HTMLParser module.
Thanks...
More information about the Python-list
mailing list