Most efficient way to "pre-grow" a list?
Francis Carr
coldtortuga at gmail.com
Wed Nov 11 11:13:54 EST 2009
Hmmm. I am trying some algorithms, and timeit reports that a
list.extend variant is fastest... WTH?! Really this seems like it
must be a "bug" in implementing the "[None]*x" idiom.
As expected, appending one-at-a-time is slowest (by an order of
magnitude):
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.append(z[0])"
10 loops, best of 3: 223 msec per loop
A method based on list.extend doubling is much better:
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z[:N-len(z)])"
10 loops, best of 3: 22.2 msec per loop
The straightforward "pythonic" solution is better yet:
% python -m timeit -s "N=1000000" \
"z=[2**32-1]*N"
100 loops, best of 3: 8.81 msec per loop
And the winner for speed is... over-allocate using list.extend, then
delete!
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
100 loops, best of 3: 6.93 msec per loop
Using array.array('I') gives similar results:
% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.append(z[0])"
10 loops, best of 3: 278 msec per loop
% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.extend(z[:N-len(z)])"
100 loops, best of 3: 7.82 msec per loop
% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"z=z*N"
100 loops, best of 3: 6.04 msec per loop
% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
100 loops, best of 3: 4.02 msec per loop
These results appear to scale up. I replaced N=1000000 < 2**20 with
N=2**25 and obtained the same relative ordering:
% python -m timeit -s "N=2**25" \
"z=[2**32-1]" \
"while len(z)<N: z.append(z[0])"
[Ctrl-C after a loooong wait]
% python -m timeit -s "N=2**25" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z[:N-len(z)])"
10 loops, best of 3: 872 msec per loop
% python -m timeit -s "N=2**25" \
"z=[2**32-1]*N"
10 loops, best of 3: 434 msec per loop
% python -m timeit -s "N=2**25" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
10 loops, best of 3: 346 msec per loop
And just to see if array.array can help us when large amounts of
memory are in use:
% python -m timeit -s "import array" -s "N=2**25" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
10 loops, best of 3: 149 msec per loop
If speed is the overriding concern, then there's another factor of two
right there.
Python version and hardware info:
Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) [GCC 4.3.3] on
linux2
12Gb RAM on a quad-core Intel Xeon 3Ghz CPU
-- FC
More information about the Python-list
mailing list