Parsing html

Thomas Guettler guettli at
Fri Jul 9 15:02:26 CEST 2004

Am Thu, 08 Jul 2004 17:04:24 +0100 schrieb C Gillespie:

> Dear All,
> I have hopefully a very simple problem. I wish to parse an html page and
> extract everything between the <body> tags.
> E.g.
> <head>
>     <body>
>         <b>afsdf</b>
>     </body>
> </head>
> Would give
> <body>
>     <b>afsdf</b>
> </body>
> I've been playing about with htmllib with no successful. Any suggestions?

HTML can be broken in many ways. If you want
a solution which can read most of the HTML on the
web, you can use tidy and use XML as output.

XML can be handled much easier with SAX/DOM.


Thomas Güttler,

More information about the Python-list mailing list