[XML-SIG] Parsing malformed XHTML
ashearerw at shearersoftware.com
Tue May 23 22:38:39 CEST 2006
> Lars Kellogg-Stedman wrote:
> > I need to parse this document into a DOM, make some changes, and then
> > spit back out the modified file as (X?)HTML (ideally well-formed). Am
> > I going to be able to do this with PyXML? If not, I'd love to hear
> > your suggestions for the appropriate tools.
> > Thanks!
> > -- Lars
> You might want to look into Beautiful Soup. Another approach is to pass
> the document through HTML Tidy and then process the output.
Another possibility is HTMLFilter. It parses HTML 4 or
backward-compatible XHTML in a way that's more SAX-like than DOM-like,
though you could still use it to build a DOM. It's well suited for
modifying documents in place, because tags you don't need to modify
can pass straight through without risk of indigestion.
More information about the XML-SIG