(htmllib) How to capture text that includes tags?

jennyw jennyw at dangerousideas.com
Wed Nov 5 19:24:51 CET 2003


On Wed, Nov 05, 2003 at 11:23:36AM +0100, Peter Otten wrote:
> I've found the parser in the HTMLParser module to be a lot easier to use.
> Below is the rough equivalent of your posted code. In the general case you
> will want to keep a stack of tags instead of the simple infont flag.

Thanks! Whare are the main advantages of HTMLParser over htmllib?

The code gives me something to think about ... it doesn't work right now
because it turns out there are nested font tags (which means the asserts
fail, and if I comment them out, it generates a 53 MB file from a < 1 MB
source file). I'll try playing with it and seeing if I can get it to do
what I want. 

It would be easier if I could find a way to view the HTML as a tree ...
as a side note, are there any good utils to do this?

Thanks again!

Jen





More information about the Python-list mailing list