guettli at thomas-guettler.de
Fri Jul 9 15:02:26 CEST 2004
Am Thu, 08 Jul 2004 17:04:24 +0100 schrieb C Gillespie:
> Dear All,
> I have hopefully a very simple problem. I wish to parse an html page and
> extract everything between the <body> tags.
> Would give
> I've been playing about with htmllib with no successful. Any suggestions?
HTML can be broken in many ways. If you want
a solution which can read most of the HTML on the
web, you can use tidy and use XML as output.
XML can be handled much easier with SAX/DOM.
Thomas Güttler, http://www.thomas-guettler.de/
More information about the Python-list