[XML-SIG] How does one process HTML with the DOM support in PyXML?

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 13 Jun 2001 08:13:40 +0200


> I've been looking at the PyXML docs, to see how whether it could be
> used to parse HTML files.  There seems to be something interesting
> under section 4.5 of xml-howto, entitled "Processing HTML".  But the
> contents of that chapter say only "Intro to HTML builder".  Any
> further tips elsewhere?

I suggest to look demo/dom/dom_from_html.py. Building a DOM tree from
an HTML document really isn't more than

from xml.dom.ext.reader import HtmlLib
from xml.dom import ext
reader = HtmlLib.Reader()
dom_object = reader.fromUri(fileName)

Then you get all the DOM interfaces, including the ones defined only
for HTML.

Regards,
Martin