Most efficient way to "pre-grow" a list?

Francis Carr coldtortuga at gmail.com
Wed Nov 11 17:13:54 CET 2009


Hmmm.  I am trying some algorithms, and timeit reports that a
list.extend variant is fastest... WTH?!  Really this seems like it
must be a "bug" in implementing the "[None]*x" idiom.


As expected, appending one-at-a-time is slowest (by an order of
magnitude):
  % python -m timeit -s "N=1000000" \
    "z=[2**32-1]" \
    "while len(z)<N: z.append(z[0])"
  10 loops, best of 3: 223 msec per loop

A method based on list.extend doubling is much better:
  % python -m timeit -s "N=1000000" \
    "z=[2**32-1]" \
    "while len(z)<N: z.extend(z[:N-len(z)])"
  10 loops, best of 3: 22.2 msec per loop

The straightforward "pythonic" solution is better yet:
  % python -m timeit -s "N=1000000" \
    "z=[2**32-1]*N"
  100 loops, best of 3: 8.81 msec per loop

And the winner for speed is... over-allocate using list.extend, then
delete!
  % python -m timeit -s "N=1000000" \
    "z=[2**32-1]" \
    "while len(z)<N: z.extend(z)" \
    "del z[N:]"
  100 loops, best of 3: 6.93 msec per loop


Using array.array('I') gives similar results:
  % python -m timeit -s "import array" -s "N=1000000" \
    "z=array.array('I', [2**32-1])" \
    "while len(z)<N: z.append(z[0])"
  10 loops, best of 3: 278 msec per loop

  % python -m timeit -s "import array" -s "N=1000000" \
    "z=array.array('I', [2**32-1])" \
    "while len(z)<N: z.extend(z[:N-len(z)])"
  100 loops, best of 3: 7.82 msec per loop

  % python -m timeit -s "import array" -s "N=1000000" \
    "z=array.array('I', [2**32-1])" \
    "z=z*N"
  100 loops, best of 3: 6.04 msec per loop

  % python -m timeit -s "import array" -s "N=1000000" \
    "z=array.array('I', [2**32-1])" \
    "while len(z)<N: z.extend(z)" \
    "del z[N:]"
  100 loops, best of 3: 4.02 msec per loop


These results appear to scale up.  I replaced N=1000000 < 2**20 with
N=2**25 and obtained the same relative ordering:
  % python -m timeit -s "N=2**25" \
    "z=[2**32-1]" \
    "while len(z)<N: z.append(z[0])"
  [Ctrl-C after a loooong wait]

  % python -m timeit -s "N=2**25" \
    "z=[2**32-1]" \
    "while len(z)<N: z.extend(z[:N-len(z)])"
  10 loops, best of 3: 872 msec per loop

  % python -m timeit -s "N=2**25" \
    "z=[2**32-1]*N"
  10 loops, best of 3: 434 msec per loop

  % python -m timeit -s "N=2**25" \
    "z=[2**32-1]" \
    "while len(z)<N: z.extend(z)" \
    "del z[N:]"
  10 loops, best of 3: 346 msec per loop


And just to see if array.array can help us when large amounts of
memory are in use:
  % python -m timeit -s "import array" -s "N=2**25" \
    "z=array.array('I', [2**32-1])" \
    "while len(z)<N: z.extend(z)" \
    "del z[N:]"
  10 loops, best of 3: 149 msec per loop
If speed is the overriding concern, then there's another factor of two
right there.


Python version and hardware info:
  Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) [GCC 4.3.3] on
linux2
  12Gb RAM on a quad-core Intel Xeon 3Ghz CPU


 -- FC



More information about the Python-list mailing list