Andrew M. Kuchling
Mon, 8 Mar 1999 15:17:01 -0500 (EST)
Fred L. Drake writes:
> Yes, I'm aware of HTML Tidy. I was thinking not so much of cleaning
>up the HTML for a site as being able to load it into an arbitrary
>program. I've no beef with Dan, but having it as a Python class can
>be useful as well; this would definately be nice in the context of a
Indeed. In particular, it would be useful for Web discussion
forums and other applications where users can produce HTML to be
included. Some sites, such as slashdot.org, attempt to restrict the
tags the users can use; you can't use <pre>, for example. But that
doesn't prevent a user entering an unclosed <ul> tag, which will mess
up the rest of the page. It would be far more powerful to parse the
possibly-bogus HTML and produce a well-formed rendering of it.
Unclosed tags could be handled now; just use a forgiving
version of HTMLBuilder to get a DOM tree, and output well-formed HTML
from the tree. But that wouldn't handle invalid HTML (like using <li>
outside of <ul> or <ol>) or style that's bad but legal (images without
ALT attributes). I've been toying with the idea of converting my Web
pages to XML-compatible HTML for a while, and may play with this a bit.
A.M. Kuchling http://starship.python.net/crew/amk/
I am so scared. It's strange. For many thousand years I have prayed for death.
I have prayed to all the gods for peace and relief and... I have prayed for an
-- Orpheus, in SANDMAN #49: "Brief Lives:9"