[XML-SIG] minidom w/ HTML
Mike Hostetler
hostetlerm at gmail.com
Mon Jun 28 14:54:41 EDT 2004
On Mon, 21 Jun 2004 12:25:59 -0700, jennyw
<jennyw at colorfulexpressions.com> wrote:
>
> I have a project where I need to parse html files that are table heavy
> (a calendar, actually), and I thought minidom would be perfect for my
> needs. The problem is that the HTML that I'm trying to parse isn't quite
> valid XML -- mostly minor things, but enough so that minidom won't work.
> Is there a something that would convert an html file into XML that
> would work with minidom? Or is there something better, like something
> more geared towards html that I should be looking at?
>
I've recently discovered BeautifulSoup, and it works wonderfully for
parsing HTML.:
http://www.crummy.com/software/BeautifulSoup/
I've done the "run through Tidy and then use minidom" approach before.
It works fine, except that it can be quite slow, especially if the
HTML isn't anything that resembles XHTML.
-- mikeh
More information about the XML-SIG
mailing list