Most efficient way to "pre-grow" a list?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Wed Nov 11 19:22:53 EST 2009


On Wed, 11 Nov 2009 08:13:54 -0800, Francis Carr wrote:

> Hmmm.  I am trying some algorithms, and timeit reports that a
> list.extend variant is fastest... WTH?!  Really this seems like it must
> be a "bug" in implementing the "[None]*x" idiom.

> The straightforward "pythonic" solution is better yet:
>   % python -m timeit -s "N=1000000" \
>     "z=[2**32-1]*N"
>   100 loops, best of 3: 8.81 msec per loop


You're possibly including the time taken to calculate 2**32-1 in each 
test, which is unnecessary. The peek-hole optimizer *may* be optimizing 
that away to "z=[4294967295L]*N", or it may not. In any case, why not 
make the test even simpler and use z=[None]*N? You're testing list 
methods, not arithmetic and longs.

But regardless of what's inside the list, what Python does internally is 
create an array of some size M, where M is some multiple of two and 
larger than N, then set the first N elements to the same item, leaving 
the rest unallocated. (That's what CPython does, other Pythons may behave 
differently.) Then each time you grow the list, it doubles in size. So 
over-allocating doesn't necessarily have the cost you think it does.



> And the winner for speed is... over-allocate using list.extend, then
> delete!
>   % python -m timeit -s "N=1000000" \
>     "z=[2**32-1]" \
>     "while len(z)<N: z.extend(z)" \
>     "del z[N:]"
>   100 loops, best of 3: 6.93 msec per loop


I can't confirm that. I suspect you're running into a hardware specific 
cache effect or some side-effect of your specific setup.

For what it's worth, my timings compared to your timings are:

Append one-at-a-time:
Me : You
520 : 223

list.extend doubling:
18.5 : 22.2

Multiplication:
9.36 : 8.81

over-allocate then delete
9.42 : 6.93


-- 
Steven



More information about the Python-list mailing list