
On Tue, 19 Oct 2004 12:02:14 +0200 (CEST), Evan Jones <ejones@uwaterloo.ca> wrote:
Subject: [Python-Dev] Changing pymalloc behaviour for long running processes
[ snip ]
The short version of the problem is that obmalloc.c never frees memory. This is a great strategy if the application runs for a short time then quits, or if it has fairly constant memory usage. However, applications with very dynamic memory needs and that run for a long time do not perform well because Python hangs on to the peak amount of memory required, even if that memory is only required for a tiny fraction of the run time. With my application, I have a python process which occupy 1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about 5 minutes. This is a problem that needs to be addressed, as it negatively impacts the performance of Python when manipulating very large data sets. In fact, I found a mailing list post where the poster was looking for a workaround for this issue, but I can't find it now.
Some posts to various lists [1] have stated that this is not a real problem because virtual memory takes care of it. This is fair if you are talking about a couple megabytes. In my case, I'm talking about ~700 MB of wasted RAM, which is a problem. First, this is wasting space which could be used for disk cache, which would improve the performance of my system. Second, when the system decides to swap out the pages that haven't been used for a while, they are dirty and must be written to swap. If Python ever wants to use them again, they will be brought it from swap. This is much worse than informing the system that the pages can be discarded, and allocating them again later. In fact, the other native object types (ints, lists) seem to realize that holding on to a huge amount of memory indefinitely is a bad strategy, because they explicitly limit the size of their free lists. So why is this not a good idea for other types?
Does anyone else see this as a problem?
This is such a big problem for us that we had to rewrite some of our daemons to fork request handlers so that the memory would be freed. That's the only way we've found to deal with it, and it seems, that's the preferred python way of doing things, using processes, IPC, fork, etc. instead of threads. In order to be able to release memory, the interpreter has to allocate memory in chunks bigger than the minimum that can be returned to the OS, e.g., in Linux that'd be 256bytes (iirc), so that libc's malloc would use mmap to allocate that chunk. Otherwise, if the memory was obtained with brk, then in most virtually all OSes and malloc implementations, it won't be returned to the OS even if the interpreter frees the memory. For example, consider the following code in the interactive interpreter: for i in range(10000000): pass That run will create a lot of little integer objects and the virtual memory size of the interpreter will quickly grow to 155MB and then drop to 117MB. The 117MB left are all those little integer objects that are not in use any more that the interpreter would reuse as needed. When the system needs memory, it will page out the pages where these objects have been allocated to swap. In our application, paging to swap is extremely bad because sometimes we're running the OS booted from the net without swap. The daemon has to loop over list of 20 to 40 thousand items at a time and it quickly grows to 60mb on the first run and then continues to grow from there. When something else needs memory, it tries to swap and then crashes. In the example above, the difference between 155MB and 117MB is 37MB, which I assume is the size of the list object returned by 'range()' which contains the references to the integers. The list goes away when the interpreter finishes running the loop and because it was already known how big it was going to be, it was allocated as a big chunk using mmap (my speculation). As a result, that memory was given back to the OS and the virtual memory size of the interpreter went down from 155MB to 117MB. Regards, -- Luis P Caamano Atlanta, GA USA PS I rarely post to python-dev, this is probably the first time, so please let me take this opportunity to thank all the python developers for all your efforts, such a great language, and great tool. My respect and admiration to all of you.