Python not giving free memory back to the os get's me in real problems ...

Thu Apr 26 08:04:08 EDT 2007

leuchte at gmail.com wrote:
> Our (python-)macro uses massively nested loops which are unfortunately
> necessary. These loops perform complex calculations in this commercial
> tool. To give you a quick overview how long this macros runs:
> 
> The outer loop takes 5-7 hours for one cycle. Each cycle creates one
> outputfile. So we would like to perform 3-5 outer cycles en bloc.
> Unfortunately one of our computers (768MB RAM) crashes after just ~10%
> of the first cycle with the following error message:
> 
> http://img2.freeimagehosting.net/uploads/7157b1dd7e.jpg
> 
> while another computer (1GB RAM) crashes after ~10% of the fourth
> loop. While the virtual memory on the 1gb machine was full to the
> limit when it crashed the memory usage of the 768mb machine looked
> this this:

While Python won't return memory to the OS, this is not
the same as a memory leak. When your Python program is
done using some objects, the memory they needed can be
used by other Python objects in the same process. The
process size should never grow bigger than the actually
needed memory at a point in time.

The (not only) Python problem is that the virtual memory
of your process will never shrink, once it has attained
its maximum size. Since the unused parts of its virtual
memory will be paged out to disk, this is probably not
big a problem anyway.

If you have a long running process that you want to
keep slim, it might be worth factoring out the memory
intensive parts to separate processes like this in your
macros:

for x,y,z in outer_loop_sequence:
     os.system('python the_real_worker.py %s %s %s' % (x,y,z))

This should contain the big memory usage to short-running
processes. It should not solve your crash problem though.
If your crashes go away if you rig it like this, something
was broken, either in your macro, or in the way the
application uses Python for macros. You won't spend less
memory because you divide the problem to two processes.

To actually solve the problem, you need to look closer at
your algorithms and data structures. Are you using effective
ways of storing data? For instance, I imagine that numarray
or whatever those things are called these days are much,
much more effective at handling a vector of numerical values
than a Python list of e.g. ints. Lists are pretty expensive.
Lists of ints are at least twice as big as efficient integer
arrays. Are you copying data when you should just share
references? Are you hanging on to data longer than you
actually need it?

I don't see why there would be a big difference if you
use Python or e.g. C or C++ to solve this kind of problem.
If you use lots of data, you need to understand the data
structures and use them effectively. With Python you are
get shorter development time and less code to maintain to
solve a certain problem. The price for that is typically
longer execution time, but disregarding small systems that
can't bear the size of the Python runtime, memory should
not be a big problem. Python isn't Java.