Memory problems (garbage collection)
Gerhard Häring
gh at ghaering.de
Thu Apr 23 03:25:30 EDT 2009
Carbon Man wrote:
> Very new to Python, running 2.5 on windows.
> I am processing an XML file (7.2MB). Using the standard library I am
> recursively processing each node and parsing it. The branches don't go
> particularly deep. What is happening is that the program is running really
> really slowly, so slow that even running it over night, it still doesn't
> finish.
> Stepping through it I have noticed that memory usage has shot up from 190MB
> to 624MB and continues to climb.
That sounds indeed like a problem in the code. But even if the XML file
is only 7.2 MB the XML structures and what you create out of them have
some overhead.
> If I set a break point and then stop the
> program the memory is not released. It is not until I shutdown PythonWin
> that the memory gets released.
Then you're apparently looking at VSIZE or whatever it's called on
Windows. It's the maximum memory the process ever allocated. And this
usually *never* decreases, no matter what the application (Python or
otherwise).
> [GC experiments]
Unless you have circular references, in my experience automatic garbage
collection in Python works fine. I never had to mess with it myself in
10 years of Python usage.
> If I have the program at a break and do gc.collect() it doesn't fix it, so
> whatever referencing is causing problems is still active.
> My program is parsing the XML and generating a Python program for
> SQLalchemy, but the program never gets a chance to run the memory problem is
> prior to that. It probably has something to do with the way I am string
> building.
Yes, you're apparently concatenating strings. A lot. Don't do that. At
least not this way:
s = ""
s += "something"
s += "else"
instead do this:
from cStringIO import StringIO
s = StringIO()
s.write("something")
s.write("else")
...
s.seek(0)
print s.read()
or
lst = []
lst.append("something")
lst.append("else")
print "".join(lst)
> My apologies for the long post but without being able to see the code I
> doubt anyone can give me a solid answer so here it goes (sorry for the lack
> of comments): [...]
Code snipped.
Two tips: Use one of the above methods for concatenating strings. This
is a common problem in Python (and other languages, Java and C# also
have StringBuilder classes because of this).
If you want to speed up your XML processing, use the ElementTree module
in the standard library. It's a lot easier to use and also faster than
what you're using currently. A bonus is it can be swapped out for the
even faster lxml module (externally available, not in the standard
library) by changing a single import for another noticable performance
improvement.
HTH
-- Gerhard
More information about the Python-list
mailing list