Large list in memory slows Python

Peter Otten __peter__ at web.de
Tue Jan 3 17:59:14 EST 2012


Benoit Thiell wrote:

> I am experiencing a puzzling problem with both Python 2.4 and Python
> 2.6 on CentOS 5. I'm looking for an explanation of the problem and
> possible solutions. Here is what I did:
> 
> Python 2.4.3 (#1, Sep 21 2011, 19:55:41)
> IPython 0.8.4 -- An enhanced Interactive Python.
> 
> In [1]: def test():
>    ...:     return [(i,) for i in range(10**6)]
> 
> In [2]: %time x = test()
> CPU times: user 0.82 s, sys: 0.04 s, total: 0.86 s
> Wall time: 0.86 s
> 
> In [4]: big_list = range(50 * 10**6)
> 
> In [5]: %time y = test()
> CPU times: user 9.11 s, sys: 0.03 s, total: 9.14 s
> Wall time: 9.15 s
> 
> As you can see, after creating a list of 50 million integers, creating
> the same list of 1 million tuples takes about 10 times longer than the
> first time.
> 
> I ran these tests on a machine with 144GB of memory and it is not
> swapping. Before creating the big list of integers, IPython used 111MB
> of memory; After the creation, it used 1664MB of memory.

In older Pythons the heuristic used to decide when to run the cyclic garbage 
collection is not well suited for the creation of many objects in a row.
Try switching it off temporarily with

import gc
gc.disable()
# create many objects that are here to stay
gc.enable()

You may also encorporate that into your test function:

def test():
    gc.disable()
    try:
        return [...]
    finally:
        gc.enable()





More information about the Python-list mailing list