[Python-Dev] Improving the Python Memory Allocator
Evan Jones
ejones at uwaterloo.ca
Sun Jan 23 23:19:22 CET 2005
This message is a follow up to a thread I started on python-dev back in
October, archived here:
http://mail.python.org/pipermail/python-dev/2004-October/049480.html
Basically, the problem I am trying to solve is that the Python memory
allocator never frees memory back to the operating system. I have
attached a patch against obmalloc.c for discussion. The patch still has
some rough edges and possibly some bugs, so I don't think it should be
merged as is. However, I would appreciate any feedback on the chances
for getting this implementation into the core. The rest of this message
lists some disadvantages to this implementation, a description of the
important changes, a benchmark, and my future plans if this change gets
accepted.
The patch works for any version of Python that uses obmalloc.c (which
includes Python 2.3 and 2.4), but I did my testing with Python 2.5 from
CVS under Linux and Mac OS X. This version of the allocator will
actually free memory. It has two disadvantages:
First, there is slightly more overhead with programs that allocate a
lot of memory, release it, then reallocate it. The original allocator
simply holds on to all the memory, allowing it to be efficiently
reused. This allocator will call free(), so it also must call malloc()
again when the memory is needed. I have a "worst case" benchmark which
shows that this cost isn't too significant, but it could be a problem
for some workloads. If it is, I have an idea for how to work around it.
Second, the previous allocator went out of its way to permit a module
to call PyObject_Free while another thread is executing
PyObject_Malloc. Apparently, this was a backwards compatibility hack
for old Python modules which erroneously call these functions without
holding the GIL. These modules will have to be fixed if this
implementation is accepted into the core.
Summary of the changes:
- Add an "arena_object" structure for tracking pages that belong to
each 256kB arena.
- Change the "arenas" array from an array of pointers to an array of
arena_object structures.
- When freeing a page (a pool), it is placed on a free pool list for
the arena it belongs to, instead of a global free pool list.
- When freeing a page, if the arena is completely unused, the arena is
deallocated.
- When allocating a page, it is taken from the arena that is the most
full. This gives arenas that are almost completely unused a chance to
be freed.
Benchmark:
The only benchmark I have performed at the moment is the worst case for
this allocator: A program that allocates 1 000 000 Python objects which
occupy nearly 200MB, frees them, reallocates them, then quits. I ran
the program four times, and discarded the initial time. Here is the
object:
class Obj:
def __init__( self ):
self.dumb = "hello"
And here are the average execution times for this program:
Python 2.5:
real time: 16.304
user time: 16.016
system: 0.257
Python 2.5 + patch:
real time: 16.062
user time: 15.593
system: 0.450
As expected, the patched version spends nearly twice as much system
time than the original version. This is because it calls free() and
malloc() twice as many times. However, this difference is offset by the
fact that the user space execution time is actually *less* than the
original version. How is this possible? The likely cause is because the
original version defined the arenas pointer to be "volatile" in order
to work when Free and Malloc were called simultaneously. Since this
version breaks that, the pointer no longer needs to be volatile, which
allows the value to be stored in a register instead of being read from
memory on each operation.
Here are some graphs of the memory allocator behaviour running this
benchmark.
Original: http://www.eng.uwaterloo.ca/~ejones/original.png
New: http://www.eng.uwaterloo.ca/~ejones/new.png
Future Plans:
- More detailed benchmarking.
- The "specialized" allocators for the basic types, such as ints, also
need to free memory back to the system.
- Potentially the allocator should keep some amount of free memory
around to improve the performance of programs that cyclically allocate
and free large amounts of memory. This amount should be "self-tuned" to
the application.
Thank you for your feedback,
Evan Jones
-------------- next part --------------
A non-text attachment was scrubbed...
Name: python-allocator.diff
Type: application/octet-stream
Size: 19080 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20050123/33ee017b/python-allocator-0001.obj
More information about the Python-Dev
mailing list