I know that this has been discussed a bit in the past, but I was hoping that some Python gurus could shed some light on this issue, and maybe let me know if there are any plans for solving this problem. I know a hack that might work, but there must be a better way to solve this problem.
The short version of the problem is that obmalloc.c never frees memory. This is a great strategy if the application runs for a short time then quits, or if it has fairly constant memory usage. However, applications with very dynamic memory needs and that run for a long time do not perform well because Python hangs on to the peak amount of memory required, even if that memory is only required for a tiny fraction of the run time. With my application, I have a python process which occupy 1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about 5 minutes. This is a problem that needs to be addressed, as it negatively impacts the performance of Python when manipulating very large data sets. In fact, I found a mailing list post where the poster was looking for a workaround for this issue, but I can't find it now.
Some posts to various lists  have stated that this is not a real problem because virtual memory takes care of it. This is fair if you are talking about a couple megabytes. In my case, I'm talking about ~700 MB of wasted RAM, which is a problem. First, this is wasting space which could be used for disk cache, which would improve the performance of my system. Second, when the system decides to swap out the pages that haven't been used for a while, they are dirty and must be written to swap. If Python ever wants to use them again, they will be brought it from swap. This is much worse than informing the system that the pages can be discarded, and allocating them again later. In fact, the other native object types (ints, lists) seem to realize that holding on to a huge amount of memory indefinitely is a bad strategy, because they explicitly limit the size of their free lists. So why is this not a good idea for other types?
Does anyone else see this as a problem? Does anyone think this is not a problem?
Proposal: - Python's memory allocator should occasionally free memory if the memory usage has been relatively constant, and has been well below the amount of memory allocated. This will incur additional overhead to free the memory, and additional overhead to reallocate it if the memory is needed again quickly. However, it will make Python co-operate nicely with other processes, and a clever implementation should be able to reduce the overhead.
Problem: - I do not completely understand Python's memory allocator, but from what I see, it will not easily support this.
I've been playing with the fact that the "collect" function in the gc module already gets called occasionally. Whenever it gets called for a level 2 collection, I've hacked it to call a cleanup function in obmalloc.c. This function goes through the free pool list, reorganizes it to decrease memory fragmentation and decides based on metrics collected from the last run if it should free some memory. It currently works fine, except that it will permit the arena vector to grow indefinitely, which is also bad for a long running process. It is also bad because these cleanups are relatively slow as they touch every free page that is currently allocated, so I'm trying to figure out a way to integrate them more cleanly into the allocator itself.
This also requires that nothing call the allocation functions while this is happening. I believe that this is reasonable, considering that it is getting called from the cyclical garbage collector, but I don't know enough about Python internals to figure that out.
Eventually, I hope to do some benchmarks and figure out if this is actually a reasonable strategy. However, I was hoping to get some feedback before I waste too much time on this.
 http://groups.google.com/groups?selm=mailman.1053801468.4243.python- list%40python.org
-- Evan Jones: http://evanjones.ca/ "Computers are useless. They can only give answers" - Pablo Picasso