[XML-SIG] How does one process HTML with the DOM support in PyXML?

Uche Ogbuji uche.ogbuji@fourthought.com
Wed, 13 Jun 2001 06:52:27 -0600


> Well, it's "HTML as deployed" but also "Python as deployed", so I need
> something that's backward-compatible to 1.5.2, I think.  It looks like
> DOM will give me some of that; I'm not sure how well it copes with
> 'loose' HTML, but so far it looks good.
> 
> I'm looking for a faster (and cleaner) upgrade from sgmllib.SGMLParser.

Try the Sgmlop reader in 4DOM (comes with PyXML).  It's works with Python 
1.5.2 and abive, is fast, and does deal pretty leniently with broken HTML.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
XML strategy, XML tools (http://4Suite.org), knowledge management