Large list in memory slows Python

Wed Jan 4 09:57:56 EST 2012

On Tue, Jan 3, 2012 at 5:59 PM, Peter Otten <__peter__ at web.de> wrote:
> Benoit Thiell wrote:
>
>> I am experiencing a puzzling problem with both Python 2.4 and Python
>> 2.6 on CentOS 5. I'm looking for an explanation of the problem and
>> possible solutions. Here is what I did:
>>
>> Python 2.4.3 (#1, Sep 21 2011, 19:55:41)
>> IPython 0.8.4 -- An enhanced Interactive Python.
>>
>> In [1]: def test():
>>    ...:     return [(i,) for i in range(10**6)]
>>
>> In [2]: %time x = test()
>> CPU times: user 0.82 s, sys: 0.04 s, total: 0.86 s
>> Wall time: 0.86 s
>>
>> In [4]: big_list = range(50 * 10**6)
>>
>> In [5]: %time y = test()
>> CPU times: user 9.11 s, sys: 0.03 s, total: 9.14 s
>> Wall time: 9.15 s
>>
>> As you can see, after creating a list of 50 million integers, creating
>> the same list of 1 million tuples takes about 10 times longer than the
>> first time.
>>
>> I ran these tests on a machine with 144GB of memory and it is not
>> swapping. Before creating the big list of integers, IPython used 111MB
>> of memory; After the creation, it used 1664MB of memory.
>
> In older Pythons the heuristic used to decide when to run the cyclic garbage
> collection is not well suited for the creation of many objects in a row.
> Try switching it off temporarily with
>
> import gc
> gc.disable()
> # create many objects that are here to stay
> gc.enable()
>
> You may also encorporate that into your test function:
>
> def test():
>    gc.disable()
>    try:
>        return [...]
>    finally:
>        gc.enable()

Thanks Peter, this is very helpful. Modifying my test according to
your directions produced much more consistent results.

Benoit.

-- 
Benoit Thiell
The SAO/NASA Astrophysics Data System
http://adswww.harvard.edu/