[XML-SIG] How does one process HTML with the DOM support in PyXML?
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Wed, 13 Jun 2001 08:13:40 +0200
> I've been looking at the PyXML docs, to see how whether it could be
> used to parse HTML files. There seems to be something interesting
> under section 4.5 of xml-howto, entitled "Processing HTML". But the
> contents of that chapter say only "Intro to HTML builder". Any
> further tips elsewhere?
I suggest to look demo/dom/dom_from_html.py. Building a DOM tree from
an HTML document really isn't more than
from xml.dom.ext.reader import HtmlLib
from xml.dom import ext
reader = HtmlLib.Reader()
dom_object = reader.fromUri(fileName)
Then you get all the DOM interfaces, including the ones defined only
for HTML.
Regards,
Martin