memory problem with list creation
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Wed Jan 13 11:13:27 EST 2010
En Wed, 13 Jan 2010 11:24:04 -0300, Allard Warrink
<allardwarrink at gmail.com> escribió:
> Within a python script I'm using a couple of different lists
> containing a large number of floats (+8M). The execution of this
> script fails because of an memory error (insufficient memory).
> I thought this was strange because I delete all lists that are not
> longer necessary directly and my workstation theoretically has more
> than enough memory to run the script.
>
> so I did some investigation on the memory use of the script. I found
> out that when i populated the lists with floats using a for ... in
> range() loop a lot of overhead memory is used and that this memory is
> not freed after populating the list and is also not freed after
> deleting the list.
>
> This way the memory keeps filling up after each newly populated list
> until the script crashes.
After reading my comments below, please revise your testing and this
conclusion.
If you build the *same* list several times and the memory usage keeps
growing, this may indicate a memory leak. But a peak memory consumption
because of temporary objects is not enough evidence.
> I did a couple of tests and found that populating lists with range or
> xrange is responsible for the memory overhead.
> Does anybody know why this happens and if there's a way to avoid this
> memory problem?
>
> First the line(s) python code I executed.
> Then the memory usage of the process:
> Mem usage after creation/populating of big_list
> sys.getsizeof(big_list)
> Mem usage after deletion of big_list
Note that sys.getsizeof(big_list) must be always the same - the list
itself takes always the same space, it depends on the number of contained
items alone (and in second place, its history). You didn't take into
account the memory taken for the contained items themselves.
> 1) big_list = [0.0] * 2700*3250
This involves the objects 0.0, an intermediate list of size 2700, a couple
integere and nothing more.
> 2) big_list = [0.0 for i in xrange(2700*3250)]
This involves creating an integer object representing every integer in the
range, but most of them are quickly discarded.
> 3) big_list = [0.0 for i in range(2700*3250)]
This involves building a temporary list containing every integer in the
range. All of them must be available simultaneously (to exist in the list).
In all these three scenarios, the only "permanent" objects are a big list
which holds several million references to the single float object 0.0; on
my Windows build, 32 bits, this takes 35MB.
> 4) big_list = [float(i) for i in xrange(2700*3250)]
Like 2) above, but now the final list contains several million different
objects. 175MB would be required on my PC: getsizeof(big_list) +
len(big_list)*getsizeof(0.0)
> 5) big_list = [float(i) for i in range(2700*3250)]
Like 4), the final list requires more memory, and also like in 3), a
temporary integer list is required.
> 6) big_list = [i for i in xrange(2700*3250)]
Same as 4). float objects are slightly bigger than integers so this one
takes less memory.
> 7) big_list = [i for i in range(2700*3250)]
Compared with 6) this requires building a temporary list with all those
integers, like 3) and 5)
> 8)
> big_list = []
> for i in range(2700*3250):
> big_list.append(float(i))
> 285
> 36
> 250
>
> 9) same as 8) but using xrange.
As above, range() requires building an intermediate list.
In Python (CPython specifically) many types (like int and float) maintain
a pool of unused, freed objects. And the memory manager maintains a pool
of allocated memory blocks. If your program has a peak memory load and
later frees most of the involved objects, memory may not always be
returned to the OS - it may be kept available for Python to use it again.
--
Gabriel Genellina
More information about the Python-list
mailing list