memory problem with list creation
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Wed Jan 13 21:03:52 EST 2010
On Wed, 13 Jan 2010 06:24:04 -0800, Allard Warrink wrote:
> Within a python script I'm using a couple of different lists containing
> a large number of floats (+8M). The execution of this script fails
> because of an memory error (insufficient memory). I thought this was
> strange because I delete all lists that are not longer necessary
> directly and my workstation theoretically has more than enough memory to
> run the script.
Keep in mind that Python floats are rich objects, not C floats, and so
take up more space: 16 bytes on a 32 bit system compared to typically 8
bytes for a C float. (Both of these may vary on other hardware or
operating systems.)
Also keep in mind that your Python process may not have access to all
your machine's memory -- some OSes default to relatively small per-
process memory limits. If you are using a Unix or Linux, you may need to
look at ulimit.
> so I did some investigation on the memory use of the script. I found out
> that when i populated the lists with floats using a for ... in range()
> loop a lot of overhead memory is used and that this memory is not freed
> after populating the list and is also not freed after deleting the list.
I would be very, very, very surprised if the memory truly wasn't freed
after deleting the lists. A memory leak of that magnitude is unlikely to
have remained undetected until now. More likely you're either
misdiagnosing the problem, or you have some sort of reference cycle.
> This way the memory keeps filling up after each newly populated list
> until the script crashes.
Can you post us the smallest extract of your script that crashes?
> I did a couple of tests and found that populating lists with range or
> xrange is responsible for the memory overhead.
I doubt it. Even using range with 8 million floats only wastes 35 MB or
so. That's wasteful, but not excessively so.
> Does anybody know why
> this happens and if there's a way to avoid this memory problem?
>
> First the line(s) python code I executed. Then the memory usage of the
> process: Mem usage after creation/populating of big_list
> sys.getsizeof(big_list)
> Mem usage after deletion of big_list
>
> big_list = [0.0] * 2700*3250
> 40
> 35
> 6
You don't specify what those three numbers are (the middle one is
getsizeof the list, but the other two are unknown. How do you calculate
memory usage? I don't believe that your memory usage is 6 bytes! Nor do I
believe that getsizeof(big_list) returns 35 bytes!
On my system:
>>> x = [0.0] * 2700*3250
>>> sys.getsizeof(x)
35100032
> big_list = [0.0 for i in xrange(2700*3250)]
> 40
> 36
> 6
This produces a lightweight xrange object, then wastefully iterates over
it to produce a list made up of eight million instances of the float 0.0.
The xrange object is then garbage collected automatically.
> big_list = [0.0 for i in range(2700*3250)]
> 145
> 36
> 110
This produces a list containing the integers 0 through 8+ million, then
wastefully iterates over it to produce a second list made up of eight
million instances of the float 0.0, before garbage collecting the first
list. So at its peak, you require 35100032 bytes for a pointless
intermediate list, doubling the memory capacity needed to generate the
list you actually want.
> big_list = [float(i) for i in xrange(2700*3250)]
> 180
> 36
> 145
Again, the technique you are using does a pointless amount of extra work.
The values in the xrange object are already floats, calling float on them
just wastes time. And again, the memory usage you claim is utterly
implausible.
To really solve this problem, we need to see actual code that raises
MemoryError. Otherwise we're just wasting time.
--
Steven
More information about the Python-list
mailing list