[XML-SIG] minidom w/ HTML
andrew at shearersoftware.com
Fri Jun 25 00:35:49 EDT 2004
You could use Python's HTMLParser module or my own HTMLFilter
module. Both present a SAX-like interface that calls back to your
code as tags fly by, rather than the DOM approach of handing you a
fully-formed, consistent data structure made from the document.
The DOM approach is complicated because of the non-well-formed nature
of typical HTML, while the SAX-like interface is a more natural fit.
> From: jennyw <jennyw at colorfulexpressions.com>
> Message-ID: <cb7co8$2cb$1 at sea.gmane.org>
> I have a project where I need to parse html files that are table heavy
> (a calendar, actually), and I thought minidom would be perfect for my
> needs. The problem is that the HTML that I'm trying to parse isn't
> valid XML -- mostly minor things, but enough so that minidom won't
> Is there a something that would convert an html file into XML that
> would work with minidom? Or is there something better, like something
> more geared towards html that I should be looking at?
Senior Analyst, Medical Computing
IS Applications Group
More information about the XML-SIG