Improving the Python Memory Allocator
This message is a follow up to a thread I started on python-dev back in October, archived here: http://mail.python.org/pipermail/python-dev/2004-October/049480.html Basically, the problem I am trying to solve is that the Python memory allocator never frees memory back to the operating system. I have attached a patch against obmalloc.c for discussion. The patch still has some rough edges and possibly some bugs, so I don't think it should be merged as is. However, I would appreciate any feedback on the chances for getting this implementation into the core. The rest of this message lists some disadvantages to this implementation, a description of the important changes, a benchmark, and my future plans if this change gets accepted. The patch works for any version of Python that uses obmalloc.c (which includes Python 2.3 and 2.4), but I did my testing with Python 2.5 from CVS under Linux and Mac OS X. This version of the allocator will actually free memory. It has two disadvantages: First, there is slightly more overhead with programs that allocate a lot of memory, release it, then reallocate it. The original allocator simply holds on to all the memory, allowing it to be efficiently reused. This allocator will call free(), so it also must call malloc() again when the memory is needed. I have a "worst case" benchmark which shows that this cost isn't too significant, but it could be a problem for some workloads. If it is, I have an idea for how to work around it. Second, the previous allocator went out of its way to permit a module to call PyObject_Free while another thread is executing PyObject_Malloc. Apparently, this was a backwards compatibility hack for old Python modules which erroneously call these functions without holding the GIL. These modules will have to be fixed if this implementation is accepted into the core. Summary of the changes: - Add an "arena_object" structure for tracking pages that belong to each 256kB arena. - Change the "arenas" array from an array of pointers to an array of arena_object structures. - When freeing a page (a pool), it is placed on a free pool list for the arena it belongs to, instead of a global free pool list. - When freeing a page, if the arena is completely unused, the arena is deallocated. - When allocating a page, it is taken from the arena that is the most full. This gives arenas that are almost completely unused a chance to be freed. Benchmark: The only benchmark I have performed at the moment is the worst case for this allocator: A program that allocates 1 000 000 Python objects which occupy nearly 200MB, frees them, reallocates them, then quits. I ran the program four times, and discarded the initial time. Here is the object: class Obj: def __init__( self ): self.dumb = "hello" And here are the average execution times for this program: Python 2.5: real time: 16.304 user time: 16.016 system: 0.257 Python 2.5 + patch: real time: 16.062 user time: 15.593 system: 0.450 As expected, the patched version spends nearly twice as much system time than the original version. This is because it calls free() and malloc() twice as many times. However, this difference is offset by the fact that the user space execution time is actually *less* than the original version. How is this possible? The likely cause is because the original version defined the arenas pointer to be "volatile" in order to work when Free and Malloc were called simultaneously. Since this version breaks that, the pointer no longer needs to be volatile, which allows the value to be stored in a register instead of being read from memory on each operation. Here are some graphs of the memory allocator behaviour running this benchmark. Original: http://www.eng.uwaterloo.ca/~ejones/original.png New: http://www.eng.uwaterloo.ca/~ejones/new.png Future Plans: - More detailed benchmarking. - The "specialized" allocators for the basic types, such as ints, also need to free memory back to the system. - Potentially the allocator should keep some amount of free memory around to improve the performance of programs that cyclically allocate and free large amounts of memory. This amount should be "self-tuned" to the application. Thank you for your feedback, Evan Jones
participants (3)
-
"Martin v. Löwis"
-
Evan Jones
-
Rodrigo Dias Arruda Senra