[Python-bugs-list] [ python-Bugs-535474 ] xml.sax memory leak with ExpatParser

noreply@sourceforge.net noreply@sourceforge.net
Wed, 03 Apr 2002 20:50:54 -0800


Bugs item #535474, was opened at 2002-03-26 18:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=535474&group_id=5470

Category: XML
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Danny Yoo (dyoo)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: xml.sax memory leak with ExpatParser

Initial Comment:
I've isolated a memory leak in the ExpatParser that
deals with the destruction of ContentHandlers.  I'm
including my test program test_memory_leak.py that
tests the behavior --- I generate a bunch of
ContentParsers, and see if they get destroyed reliably.


This appears to affect Python 2.1.1 and 2.1.2. 
Thankfully, the leak appears to be fixed in 2.2.1c. 
Here's some of the test runs:

### Python 2.1.1:
[dyoo@tesuque dyoo]$ /opt/Python-2.1.1/bin/python
test_memory_leak.py
This is a test of an apparent XML memory leak.
Test1:



Test2:
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
###



### Python 2.1.2:
[dyoo@tesuque dyoo]$ /opt/Python-2.1.2/bin/python
test_memory_leak.py
This is a test of an apparent XML memory leak.
Test1:
TestParser destructed.
TestParser destructed.



Test2:
###


### Python 2.2.1c
[dyoo@tesuque dyoo]$ /opt/Python-2.2.1c2/bin/python
test_memory_leak.py
This is a test of an apparent XML memory leak.
Test1:
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.



Test2:
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
TestParser destructed.
###





----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-03 23:50

Message:
Logged In: YES 
user_id=3066

I don't remember if the cycle detector was enabled by
default in 2.1.* -- that all seems so long ago!

The content handler ends up being part of a circular
reference cycle, with the ExpatParser acting as it's own
locator object.  This happens because the parser references
the content handler, and hands a reference to itself for the
content handler to squirrel away as the locator.

I see two approaches to removing this dependency.  The first
is simply to call setDocumentLocator(None) after calling
endDocument(), but that's fragile; it assumes the parse gets
that far.  The second is to use a separate object to provide
the locator to the content handler; this seems more robust
as it doesn't assume that the parse succeeds.

I'll start on a patch that uses the second approach.

Martin, do you see any other alternatives?  There will be a
2.1.3 release for other reasons, BTW, so this might make it in.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 17:48

Message:
Logged In: YES 
user_id=31435

Assigned to Fred, after he begged me to <wink>.

----------------------------------------------------------------------

Comment By: Danny Yoo (dyoo)
Date: 2002-03-28 17:37

Message:
Logged In: YES 
user_id=49843

Hi Martin,

Yikes; Sorry about that.  I've attached the file.

---


I did some more experimentation with xml.sax, and there does
appear to be a serious problem with object destruction, even
with Python 2.2.1c.

I'm working with a fairly large XML file located on the TIGR
(The Institute for Genomic Research) ftp site.  A sample
file would be something like:

ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/PSEUDOCHROMOSOMES/chr1.xml

(60 MBs)

and I noticed that my scripts were leaking memory.  I've
isolated the problem to what looks like a garbage collection
problem: it looks like my ContentHandlers are not getting
recycled.  Here's a simplified program:

###
import xml.sax
import glob
from cStringIO import StringIO


class FooParser(xml.sax.ContentHandler):
    def __init__(self):
        self.bigcontent = StringIO()

    def startElement(self, name, attrs):
        pass

    def endElement(self, name):
        pass

    def characters(self, chars):
        self.bigcontent.write(chars)


filename =
'/home/arabidopsis/bacs/20020107/PSEUDOCHROMOSOME/chr1.xml'
i = 0
while 1:
    print "Iteration %d" % i
    xml.sax.parse(open(filename), FooParser())
    i = i + 1
###

I've watched 'top', and the memory usage continues growing.
 Any suggestions?  Thanks!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-27 07:24

Message:
Logged In: YES 
user_id=21627

Also, what kind of action do you expect. Chances are minimal
that there will be a 2.1.3 release, so why bother?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-27 07:23

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=535474&group_id=5470