Memory Leaks and Heapy
andymac at bullseye.apana.org.au
Sat Jan 5 11:47:49 CET 2008
Yaakov Nemoy wrote:
> A couple of developers have mentioned that python might be fragmenting
> its memory space, and is unable to free up those pages. How can I go
> about testing for this, and are there any known problems like this?
> If not, what else can I do to look for leaks?
Marc-Andre brought up pymalloc, but it is worth clarifying a couple of
issues related to its use:
- pymalloc only manages allocations up to (and including) 256 bytes;
allocations larger than this are passed to the platform malloc to
- the work that was put in to allow return of empty arenas (in Python
2.5) was geared to handling the general case of applications that
created huge volumes of objects (usually at start up) and then destroy
most of them. There is no support that I'm aware of for any form of
arena rationalisation in the case of sparsely occupied arenas.
- it has been my experience that pymalloc is a significant benefit over
the platform malloc for the Python interpreter, both in terms of
performance and gross memory consumption. Prior to defaulting to
using pymalloc (as of 2.3) CPython had run into issues with the
platform malloc of just about every platform it had been ported to,
heap fragmentation being particularly notable on Windows (though other
platforms have also been subject to this).
While pymalloc is highly tuned for the general case behaviour of the
Python interpreter, just as platform malloc implementations have corner
cases so does pymalloc.
Be aware that ints and floats are managed via free lists with
memory allocation directly by the platform malloc() - these objects
are never seen by pymalloc, and neither type has support for
relinquishing surplus memory. Be also aware that many C extensions
don't use pymalloc even when they could.
In addition to Marc-Andre's suggestions, I would suggest paying
particular attention to the creation and retention of objects in your
code - if something's no longer required, explicitly delete it. It is
all too easy to lose sight of references to objects that hang around in
ways that defeat the gc support. Watch out for things that might be
sensitive to thread-ids for example.
Careful algorithm planning can also be useful, leveraging object
references to minimise duplicated data (and possibly get better
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370
andymac at pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
More information about the Python-list