[XML-SIG] minidom w/ HTML
hostetlerm at gmail.com
Mon Jun 28 14:54:41 EDT 2004
On Mon, 21 Jun 2004 12:25:59 -0700, jennyw
<jennyw at colorfulexpressions.com> wrote:
> I have a project where I need to parse html files that are table heavy
> (a calendar, actually), and I thought minidom would be perfect for my
> needs. The problem is that the HTML that I'm trying to parse isn't quite
> valid XML -- mostly minor things, but enough so that minidom won't work.
> Is there a something that would convert an html file into XML that
> would work with minidom? Or is there something better, like something
> more geared towards html that I should be looking at?
I've recently discovered BeautifulSoup, and it works wonderfully for
I've done the "run through Tidy and then use minidom" approach before.
It works fine, except that it can be quite slow, especially if the
HTML isn't anything that resembles XHTML.
More information about the XML-SIG