[XML-SIG] Unexpected memory leak in Sax + _locator
Bernhard Herzog
bh@intevation.de
07 Feb 2002 12:14:36 +0100
I found an unexpected memory leak in the sax contenthandler +
expatparser (tested with Python 2.2 compiled with expat). The following
script demonstrates the leak:
import gc
gc.set_debug(gc.DEBUG_LEAK)
import xml.sax.handler
xmldata = """<?xml version="1.0" standalone="yes" ?>
<Inquisition>
<country>Spain</country>
</Inquisition>
"""
class Handler(xml.sax.handler.ContentHandler):
def endDocument(self):
# uncomment this to avoid the leak
#self._locator = None
pass
xml.sax.parseString(xmldata, Handler())
print "gc.collect() ->", gc.collect()
Typical output:
gc: collectable <dict 0x815c1fc>
gc: collectable <ExpatParser instance at 0x817444c>
gc: collectable <dict 0x816b50c>
gc: collectable <DTDHandler instance at 0x816b59c>
gc: collectable <dict 0x819bdd4>
gc: collectable <EntityResolver instance at 0x816b5c4>
gc: collectable <dict 0x816be04>
gc: collectable <InputSource instance at 0x816b6f4>
gc: collectable <list 0x81253d4>
gc: collectable <Handler instance at 0x814aeec>
gc: collectable <dict 0x812466c>
gc.collect() -> 11
The problem is that the ExpatParser (in xml.sax.expatreader) is both the
parser and locator. The parser has a reference to the contenthandler, of
course. The ExpatParser also passes itself to the handler's
setDocumentLocator method which by default stores it in self._locator,
creating a circular reference.
I can see several ways to fix it:
1. have the parse method call self._cont_handler.setDocumentLocator(None)
This would be a slight change of the interface, though. Existing
content handlers might assume that the parameter of
setDocumentLocator is always a valid locator
2. Create a separate locator class for the expatparser which only
references the actual parser C-object. The ExpatParser instance could
also hold a reference to it so that it can be updated on reset()
3. Use a weak reference proxy to the parser as the argument to
setDocumentLocator.
I could implement one of these and submit a patch. Once you know about
this, it's easily worked around, though, so it might not be worth
fixing, but it should at least be documented.
Bernhard
--
Intevation GmbH http://intevation.de/
Sketch http://sketch.sourceforge.net/
MapIt! http://mapit.de/