Python 2.6 still not giving memory back to the OS...
Dave Angel
davea at ieee.org
Sat Aug 15 09:44:25 EDT 2009
Chris Withers wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Hi All,
>
> I thought this was fixed back in Python 2.5, but I guess not?
>
> So, I'm playing in an interactive session:
>
> >>> from xlrd import open_workbook
> >>> b = open_workbook('some.xls',pickleable=0,formatting_info=1)
>
> At this point, top shows the process usage for python to be about 500Mb.
> That's okay, I'd expect that, b is big ;-)
>
> >>> del b
>
> However, it still does now, maybe the garbage collector needs a kick?
>
> >>> import gc
> >>> gc.collect()
> 702614
>
> Nope, still 500Mb. What gives? How can I make Python give the memory
> its no longer using back to the OS?
>
> Okay, so maybe this is something to do with it being an interactive
> session? So I wrote this script:
>
> from xlrd import open_workbook
> import gc
> b = open_workbook('some.xls',pickleable=0,formatting_info=1)
> print 'opened'
> raw_input()
> del b
> print 'deleted'
> raw_input()
> gc.collect()
> print 'gc'
> raw_input()
>
> The raw inputs are there so I can check the memory usage in top.
> Even after the gc, Python still hasn't given the memory back to the OS
> :-(
>
> What am I doing wrong?
>
> Chris
>
You're not doing anything wrong. I don't know of any other environment
that "gives the memory back" to the OS.
I don't know Unix/Linux memory management, but I do know Windows, and I
suspect the others are quite similar. There are a few memory allocators
within Windows itself, and some more within the MSC runtime library.
They work similarly enough that I can safely just choose one to
explain. I'll pick on malloc().
When malloc() is called for the first time (long before your module is
loaded), it asks the operating system's low-level mapping allocator for
a multiple of 64k. The 64k will always be aligned on a 64k boundary,
and is in turn divided into 4k pages. The 64k could come from one of
three places - the swapfile, an executable (or DLL), or a data file,
but there's not much real difference between those. malloc() itself
will always use the swapfile. Anyway, at this point my memory is a
little bit fuzzy. I think only 4k of the swapfile is actually mapped
in, the rest being reserved. But malloc() will then build some data
structures for that 64k block, and as memory is requested, get more and
more pieces of that 64k, till the whole thing is mapped in. Then,
additional multiples of 64k are allocated in the same way, and of course
the data structures are chained together. If an application "frees" a
block, the data structure is updated, but the memory is not unmapped.
Theoretically, if all the blocks within one 64k were freed, malloc()
could release the 64k block to the OS, but to the best of my knowledge,
malloc() never does. Incidentally, there's a different scheme for
large blocks, but that's changed several times, and I have no idea how
it's done now.
Now, C programmers sometimes write a custom allocator, and in C++, it's
not hard to have a custom allocator manage all instances of a particular
class. This can be convenient for applications that know how their
memory usage patterns are likely to work. Photoshop for example can be
configured to use "user swap space" (I forget what they call it) from
files that Photoshop explicitly allocates. And space from that
allocator is not from the swapfile, so it's not constrained by other
running applications, and wouldn't be counted by the Windows equivalent
of 'top' (eg. the Windows Task Manager).
A custom allocator can also be designed to know when a particular set of
allocations are all freed, and release the memory entirely back to the
system. For instance, if all temp data for a particular transaction is
put into an appropriate custom allocator, then at the end of the
transaction, it can safely be released.
I would guess that Python doesn't do any custom allocators, and
therefore never releases the memory back to the system. It will however
reuse it when you allocate more stuff.
DaveA (author of the memory tracking subsystem of NuMega's BoundsChecker)
More information about the Python-list
mailing list