[XML-SIG] xml.dom.ext.reader.HtmlLib memory leak?
cbearden at hal-pc.org
Wed Aug 25 22:56:39 CEST 2004
On Mon, Aug 23, 2004 at 10:31:11AM -0600, Uche Ogbuji wrote:
> Honestly, I don't think DOM is the way I would personally go about
> processing HTML, which is why I was trying to get at whether there was
> another way for you to meet your needs.
I think I understand what you are getting at, but personally I have
found twisted.web.microdom with 'beExtremelyLenient=True', with perhaps
an mx.Tidying stage beforehand, to be invaluable in mining data from
database-generated webpages built with crappy HTML. Consider the pages
displaying individual patent records at the USPTO, e.g. . If you
need to treat such pages as if they were XML records to be parsed and
loaded into a database, something like twisted.web.microdom is a big
More information about the XML-SIG