[XML-SIG] xml.dom.ext.reader.HtmlLib memory leak?
Walter Dörwald
walter at livinglogic.de
Thu Aug 26 22:24:38 CEST 2004
Chuck Bearden wrote:
> [...]
> I haven't browsed through the dependencies to see what of the other
> Twisted pieces the microdom requires, so I can't say if it is extricable
> from the wider framework.
>
> One possibility I didn't try was to use tidy to generate real XHTML from
> the crappy HTML. It might then be posssible to use something more
> common like the minidom implementation to navigate the HTML.
>
> For me, extracting data from malformed but consistent HTML is a
> necessary task, so I do sometimes have to make some compromises
> in my selection and use of tools.
There are already tools that make sense of broken HTML: browsers.
Is there any way to reuse that functionality from Python? I.e.
something like:
>>> import mozilla
>>> x = mozilla.parse("http://www.python.org")
I don't care whether I get a DOM or a string parsable by an
XML parser.
Bye,
Walter Dörwald
More information about the XML-SIG
mailing list