memory management

Mon Nov 7 15:19:53 EST 2011

On 11/07/2011 02:43 PM, Juan Declet-Barreto wrote:
> Hi,
>
> Can anyone provide links or basic info on memory management, variable dereferencing, or the like? I have a script that traverses a file structure using os.walk and adds directory names to a list. It works for a small number of directories, but when I set it loose on a directory with thousands of dirs/subdirs, it crashes the DOS session and also the Python shell (when I run it from the shell).  This makes it difficult to figure out if the allocated memory or heap space for the DOS/shell session have overflown, or why it is crashing.
>
> Juan Declet-Barreto [ciId:image001.png at 01CC9D4A.CB6B9D70]
I don't have any reference to point you to, but CPython's memory 
management is really pretty simple.  However, it's important to tell us 
the build of Python, as there are several, with very different memory 
rules.  For example Jython, which is Python running in a Java VM, lets 
the java garbage collector handle things, and it's entirely different.

Likewise, the OS may be relevant.  You're using Windows-kind of 
terminology, but that doesn't prove you're on Windows, nor does it say 
what version.

Assuming 32 bit CPython 2.7 on XP, the principles are simple.  When an 
object is no longer accessible, it gets garbage collected*.   So if you 
build a list inside a function, and the only reference is from a 
function's local var, then the whole list will be freed when the 
function exits.  The mistakes many people make are unnecessarily using 
globals, and using lists when iterables would work just as well.

The tool on XP to tell how much memory is in use is the task manager.  
As you point out, its hard to catch a short-running app in the act.  So 
you want to add a counter to your code (global), and see how high it 
gets when it crashes.  Then put a test in your code for the timer value, 
and do an "input" somewhat earlier.

At that point, see how much memory the program is actually using.

Now, when an object is freed, a new one of the same size is likely to 
immediately re-use the space.  But if they're all different sizes, it's 
somewhat statistical.  You might get fragmentation, for example.  When 
Python's pool is full, it asks the OS for more (perhaps using swap 
space), but I don't think it ever gives it back.  So your memory use is 
a kind of ceiling case.  That's why it's problematic to build a huge 
data structure, and then walk through it, then delete it.  The script 
will probably continue to show the peak memory use, indefinitely.

* (technically, this is ref counted.  When the ref reaches zero the 
object is freed.  Real gc is more lazy scanning)