[Python-Dev] Changing pymalloc behaviour for long running processes

Evan Jones ejones at uwaterloo.ca
Tue Oct 19 06:00:46 CEST 2004


I know that this has been discussed a bit in the past, but I was hoping  
that some Python gurus could shed some light on this issue, and maybe  
let me know if there are any plans for solving this problem. I know a  
hack that might work, but there must be a better way to solve this  
problem.

The short version of the problem is that obmalloc.c never frees memory.  
This is a great strategy if the application runs for a short time then  
quits, or if it has fairly constant memory usage. However, applications  
with very dynamic memory needs and that run for a long time do not  
perform well because Python hangs on to the peak amount of memory  
required, even if that memory is only required for a tiny fraction of  
the run time. With my application, I have a python process which occupy  
1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about  
5 minutes. This is a problem that needs to be addressed, as it  
negatively impacts the performance of Python when manipulating very  
large data sets. In fact, I found a mailing list post where the poster  
was looking for a workaround for this issue, but I can't find it now.

Some posts to various lists [1] have stated that this is not a real  
problem because virtual memory takes care of it. This is fair if you  
are talking about a couple megabytes. In my case, I'm talking about  
~700 MB of wasted RAM, which is a problem. First, this is wasting space  
which could be used for disk cache, which would improve the performance  
of my system. Second, when the system decides to swap out the pages  
that haven't been used for a while, they are dirty and must be written  
to swap. If Python ever wants to use them again, they will be brought  
it from swap. This is much worse than informing the system that the  
pages can be discarded, and allocating them again later. In fact, the  
other native object types (ints, lists) seem to realize that holding on  
to a huge amount of memory indefinitely is a bad strategy, because they  
explicitly limit the size of their free lists. So why is this not a  
good idea for other types?

Does anyone else see this as a problem? Does anyone think this is not a  
problem?

Proposal:
- Python's memory allocator should occasionally free memory if the  
memory usage has been relatively constant, and has been well below the  
amount of memory allocated. This will incur additional overhead to free  
the memory, and additional overhead to reallocate it if the memory is  
needed again quickly. However, it will make Python co-operate nicely  
with other processes, and a clever implementation should be able to  
reduce the overhead.

Problem:
- I do not completely understand Python's memory allocator, but from  
what I see, it will not easily support this.

Gross Hack:

I've been playing with the fact that the "collect" function in the gc  
module already gets called occasionally. Whenever it gets called for a  
level 2 collection, I've hacked it to call a cleanup function in  
obmalloc.c. This function goes through the free pool list, reorganizes  
it to decrease memory fragmentation and decides based on metrics  
collected from the last run if it should free some memory. It currently  
works fine, except that it will permit the arena vector to grow  
indefinitely, which is also bad for a long running process. It is also  
bad because these cleanups are relatively slow as they touch every free  
page that is currently allocated, so I'm trying to figure out a way to  
integrate them more cleanly into the allocator itself.

This also requires that nothing call the allocation functions while  
this is happening. I believe that this is reasonable, considering that  
it is getting called from the cyclical garbage collector, but I don't  
know enough about Python internals to figure that out.

Eventually, I hope to do some benchmarks and figure out if this is  
actually a reasonable strategy. However, I was hoping to get some  
feedback before I waste too much time on this.

Evan Jones

[1]  
http://groups.google.com/groups?selm=mailman.1053801468.4243.python- 
list%40python.org

--
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso



More information about the Python-Dev mailing list