[XML-SIG] Re: xml.dom.ext.reader.HtmlLib memory leak?
fredrik at pythonware.com
Thu Aug 26 17:01:35 CEST 2004
<xmlsig at codeweld.com> wrote:
> Apart from that, I just think a "dom" is invaluable when there is a need to
> process a rather complex markup with all leaves, say for example when you
> implement a browser of sorts. Dom-view springs to mind. Use it on a few big
> websites for a while and the process starts to lag your computer because it
> grows in the hundreds of megabytes.
Does the leak has any relation to the size of the page you're parsing?
The sgmlop parser in pyxml is a fork of the pythonware/effbot.org version, and I don't
think it supports garbage collection. (version 1.1 of the pythonware/effbot.org does).
This means that code using it *must* make sure to explicitly kill the parse object when
parsing is done.
I don't have PyXML on this machine, but Google found this page:
which contains this initialization code:
def initParser(self, parser):
self._parser = parser
which creates a cycle: self contains a reference to the parser, which contains
references to bound methods, which contain references back to self.
To break the cycle, you must arrange for the code to do e.g.
self._parser = None
when you're done parsing.
Alternatively, you could probably switch to the effbot.org version of sgmlop:
(I haven't tested this with PyXML, but it might work. Or not.)
More information about the XML-SIG