Python 2.6 still not giving memory back to the OS...

Dave Angel davea at ieee.org
Sat Aug 15 09:44:25 EDT 2009


Chris Withers wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Hi All,
>
> I thought this was fixed back in Python 2.5, but I guess not?
>
> So, I'm playing in an interactive session:
>
> >>> from xlrd import open_workbook
> >>> b = open_workbook('some.xls',pickleable=0,formatting_info=1)
>
> At this point, top shows the process usage for python to be about 500Mb.
> That's okay, I'd expect that, b is big ;-)
>
> >>> del b
>
> However, it still does now, maybe the garbage collector needs a kick?
>
> >>> import gc
> >>> gc.collect()
> 702614
>
> Nope, still 500Mb. What gives? How can I make Python give the memory 
> its no longer using back to the OS?
>
> Okay, so maybe this is something to do with it being an interactive 
> session? So I wrote this script:
>
> from xlrd import open_workbook
> import gc
> b = open_workbook('some.xls',pickleable=0,formatting_info=1)
> print 'opened'
> raw_input()
> del b
> print 'deleted'
> raw_input()
> gc.collect()
> print 'gc'
> raw_input()
>
> The raw inputs are there so I can check the memory usage in top.
> Even after the gc, Python still hasn't given the memory back to the OS 
> :-(
>
> What am I doing wrong?
>
> Chris
>
You're not doing anything wrong.  I don't know of any other environment 
that "gives the memory back" to the OS.

I don't know Unix/Linux memory management, but I do know Windows, and I 
suspect the others are quite similar.  There are a few memory allocators 
within Windows itself, and some more within the MSC runtime library.  
They work similarly enough that I can safely just choose one to 
explain.  I'll pick on malloc().

When malloc() is called for the first time (long before your module is 
loaded), it asks the operating system's  low-level mapping allocator for 
a multiple of 64k.  The 64k will always be aligned on a 64k boundary, 
and is in turn divided into 4k pages.  The 64k could come from one of 
three places -  the swapfile, an executable (or DLL), or a data file, 
but there's not much real difference between those.  malloc() itself 
will always use the swapfile.  Anyway, at this point my memory is a 
little bit fuzzy.  I think only 4k of the swapfile is actually mapped 
in, the rest being reserved.  But malloc() will then build some data 
structures for that 64k block, and as memory is requested, get more and 
more pieces of that 64k, till the whole thing is mapped in.  Then, 
additional multiples of 64k are allocated in the same way, and of course 
the data structures are chained together.  If an application "frees" a 
block, the data structure is updated, but the memory is not unmapped.  
Theoretically, if all the blocks within one 64k were freed, malloc() 
could release the 64k block to the OS, but to the best of my knowledge, 
malloc() never does.   Incidentally, there's a different scheme for 
large blocks, but that's changed several times, and I have no idea how 
it's done now.

Now, C programmers sometimes write a custom allocator, and in C++, it's 
not hard to have a custom allocator manage all instances of a particular 
class.  This can be convenient for applications that know how their 
memory usage patterns are likely to work.  Photoshop for example can be 
configured to use "user swap space" (I forget what they call it) from 
files that Photoshop explicitly allocates.  And space from that 
allocator is not from the swapfile, so it's not constrained by other 
running applications, and wouldn't be counted by the Windows equivalent 
of  'top'  (eg. the Windows Task Manager).

A custom allocator can also be designed to know when a particular set of 
allocations are all freed, and release the memory entirely back to the 
system.  For instance, if all temp data for a particular transaction is 
put into an appropriate custom allocator, then at the end of the 
transaction, it can safely be released.

I would guess that Python doesn't do any custom allocators, and 
therefore never releases the memory back to the system.  It will however 
reuse it when you allocate more stuff.


DaveA  (author of the memory tracking subsystem of NuMega's BoundsChecker)




More information about the Python-list mailing list