[Python-Dev] Improving the Python Memory Allocator

Evan Jones ejones at uwaterloo.ca
Sun Jan 23 23:19:22 CET 2005

This message is a follow up to a thread I started on python-dev back in 
October, archived here:


Basically, the problem I am trying to solve is that the Python memory 
allocator never frees memory back to the operating system. I have 
attached a patch against obmalloc.c for discussion. The patch still has 
some rough edges and possibly some bugs, so I don't think it should be 
merged as is. However, I would appreciate any feedback on the chances 
for getting this implementation into the core. The rest of this message 
lists some disadvantages to this implementation, a description of the 
important changes, a benchmark, and my future plans if this change gets 

The patch works for any version of Python that uses obmalloc.c (which 
includes Python 2.3 and 2.4), but I did my testing with Python 2.5 from 
CVS under Linux and Mac OS X. This version of the allocator will 
actually free memory. It has two disadvantages:

First, there is slightly more overhead with programs that allocate a 
lot of memory, release it, then reallocate it. The original allocator 
simply holds on to all the memory, allowing it to be efficiently 
reused. This allocator will call free(), so it also must call malloc() 
again when the memory is needed. I have a "worst case" benchmark which 
shows that this cost isn't too significant, but it could be a problem 
for some workloads. If it is, I have an idea for how to work around it.

Second, the previous allocator went out of its way to permit a module 
to call PyObject_Free while another thread is executing 
PyObject_Malloc. Apparently, this was a backwards compatibility hack 
for old Python modules which erroneously call these functions without 
holding the GIL. These modules will have to be fixed if this 
implementation is accepted into the core.

Summary of the changes:

- Add an "arena_object" structure for tracking pages that belong to 
each 256kB arena.
- Change the "arenas" array from an array of pointers to an array of 
arena_object structures.
- When freeing a page (a pool), it is placed on a free pool list for 
the arena it belongs to, instead of a global free pool list.
- When freeing a page, if the arena is completely unused, the arena is 
- When allocating a page, it is taken from the arena that is the most 
full. This gives arenas that are almost completely unused a chance to 
be freed.


The only benchmark I have performed at the moment is the worst case for 
this allocator: A program that allocates 1 000 000 Python objects which 
occupy nearly 200MB, frees them, reallocates them, then quits. I ran 
the program four times, and discarded the initial time. Here is the 

class Obj:
	def __init__( self ):
		self.dumb = "hello"

And here are the average execution times for this program:

Python 2.5:
real time: 16.304
user time: 16.016
system: 0.257

Python 2.5 + patch:
real time: 16.062
user time: 15.593
system: 0.450

As expected, the patched version spends nearly twice as much system 
time than the original version. This is because it calls free() and 
malloc() twice as many times. However, this difference is offset by the 
fact that the user space execution time is actually *less* than the 
original version. How is this possible? The likely cause is because the 
original version defined the arenas pointer to be "volatile" in order 
to work when Free and Malloc were called simultaneously. Since this 
version breaks that, the pointer no longer needs to be volatile, which 
allows the value to be stored in a register instead of being read from 
memory on each operation.

Here are some graphs of the memory allocator behaviour running this 

Original: http://www.eng.uwaterloo.ca/~ejones/original.png
New: http://www.eng.uwaterloo.ca/~ejones/new.png

Future Plans:

- More detailed benchmarking.
- The "specialized" allocators for the basic types, such as ints, also 
need to free memory back to the system.
- Potentially the allocator should keep some amount of free memory 
around to improve the performance of programs that cyclically allocate 
and free large amounts of memory. This amount should be "self-tuned" to 
the application.

Thank you for your feedback,

Evan Jones

-------------- next part --------------
A non-text attachment was scrubbed...
Name: python-allocator.diff
Type: application/octet-stream
Size: 19080 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20050123/33ee017b/python-allocator-0001.obj

More information about the Python-Dev mailing list