[XML-SIG] How to build a DOM from an HTML file?

Alexandre Fayolle Alexandre.Fayolle@logilab.fr
Mon, 26 Feb 2001 16:35:08 +0100 (CET)


I'm trying to parse HTML documents into DOMs, using the 4DOM version that
comes with 4Suite 0.10.2

I first tried xml.dom.ext.reader.HtmlSax.HtmlDomGenerator with a
xml.dom.ext.reader.Sax.Reader but it seems to be broken (see
bug #404072). Then I tried xml.dom.ext.reader.HtmlLib.FromHmlUrl which
uses the Sgmlop parser. However, this parser looks only partially
implemented (it chokes on doctype directives, for example, which means
that pages which probably contain the most valid HTML won't be parsed).

What is the current prefered way to do this ?

Alexandre Fayolle
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).