Memory Leaks and Heapy

M.-A. Lemburg mal at
Fri Jan 4 17:10:27 CET 2008

On 2008-01-04 16:07, Yaakov Nemoy wrote:
> Hi list,
> Firstly, this is my first post here, so I hope I'm not breaking some
> unwritten etiquette rule about asking questions involving several
> different libraries.
> I'm trying to plug some memory leaks in a TurboGears program.  We (the
> Fedora Project) have a few apps in Turbogears in infrastructure that
> all seem to be running into the same issues in a variety of
> configurations.  Hopefully when I get to the cause of this in one app,
> Smolt, we can fix the others too.
> The app in question is Smolt, which uses TurboGears, SQLAlchemy with a
> MySQL backend, and simplejson for message passing between the server
> and client.  Smolt takes voluntary hardware reports from its clients,
> and generally is configured to submit around the beginning of the
> month.  Normally, our main data is cached by some separate processes
> that run short term, so we don't see any rapid memory growth, except
> for the beginning of each month, which makes isolating the problem to
> a few function calls fairly simple.  To watch for memory growth, I
> simply have a client hammer the server with 1-3 threads submitting
> information simultaneously, 100 times, with a few deletion operations
> in between.  To monitor for memory leaks, I'm using Heapy.
> To insert Heapy into the process, instead of calling 'start_server', a
> cherrypy method that does what you think it does and blocks, I'm using
> the module 'threading' to push it into a new thread.  Using the
> process in heapy's documentation, I find that after running a single
> thread, there is about 1200 bytes of leaked memory.  Despite this, the
> python process running the server has managed to grow from 16-18MB to
> something between 23-28MB each time I try this.  After a second
> iteration, heapy shows 1168 bytes leaked.  If heapy is correct, this
> means there are not many leaked objects in the python space.  Running
> a larger example, say 100 threads, for a total of 10k submissions
> takes about an hour, and in the process, python baloons up to about
> 48MB.  Still no signs of any missing objects.
> 48MB is not alot relatively speaking, but no amount of waiting seems
> to show python giving back that memory afterwards.  On our production
> server, we have up to 200k machines all updating their information
> over a 3 day period, in which the server process manages to reach
> 600MB before we forcefully restart it.
> A couple of developers have mentioned that python might be fragmenting
> its memory space, and is unable to free up those pages.  How can I go
> about testing for this, and are there any known problems like this?
> If not, what else can I do to look for leaks?

If you're using lots of small objects, you may be running into a
problem with the Python memory allocation mechanism, pymalloc. It used
to not return memory to the system. In Python 2.5 (IIRC, could be
2.6) this was changed to at least return completely empty blocks
back to the OS. For details, see Objects/obmalloc.c

This could be caused by interned strings which are kept in a special
pool dictionary to speed up string comparisons.

However, the first thing to check is whether any of the C extension
modules you are using is leaking memory. Python itself is usually
well tested for memory leaks, but this is less so for C extension
modules and it's easy to mis a few Py_DECREFs (decrementing a
Python object's reference count), causing objects to live forever.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jan 04 2008)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-list mailing list