[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike
cs1spw at bath.ac.uk
Tue Dec 2 11:07:19 EST 2003
Stuart Langridge wrote:
> I don't see that tidy's ability to tidy HTML per se is useful, but I
> think that it's very useful in that it can take invalid HTML and
> convert it to valid XHTML. That way, we can get a DOM tree from invalid
> HTML, which is very useful...
Is there any way we could get a DOM tree from invalid HTML using pure
Python tools? The HTML tools in the Python standard library at the
moment are all pure Python. Could we even use the existing sgmllib
module (or an extension of it) to create our own DOM tree from invalid HTML?
More information about the Web-SIG