Most efficient way to "pre-grow" a list?
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Wed Nov 11 19:22:53 EST 2009
On Wed, 11 Nov 2009 08:13:54 -0800, Francis Carr wrote:
> Hmmm. I am trying some algorithms, and timeit reports that a
> list.extend variant is fastest... WTH?! Really this seems like it must
> be a "bug" in implementing the "[None]*x" idiom.
> The straightforward "pythonic" solution is better yet:
> % python -m timeit -s "N=1000000" \
> "z=[2**32-1]*N"
> 100 loops, best of 3: 8.81 msec per loop
You're possibly including the time taken to calculate 2**32-1 in each
test, which is unnecessary. The peek-hole optimizer *may* be optimizing
that away to "z=[4294967295L]*N", or it may not. In any case, why not
make the test even simpler and use z=[None]*N? You're testing list
methods, not arithmetic and longs.
But regardless of what's inside the list, what Python does internally is
create an array of some size M, where M is some multiple of two and
larger than N, then set the first N elements to the same item, leaving
the rest unallocated. (That's what CPython does, other Pythons may behave
differently.) Then each time you grow the list, it doubles in size. So
over-allocating doesn't necessarily have the cost you think it does.
> And the winner for speed is... over-allocate using list.extend, then
> delete!
> % python -m timeit -s "N=1000000" \
> "z=[2**32-1]" \
> "while len(z)<N: z.extend(z)" \
> "del z[N:]"
> 100 loops, best of 3: 6.93 msec per loop
I can't confirm that. I suspect you're running into a hardware specific
cache effect or some side-effect of your specific setup.
For what it's worth, my timings compared to your timings are:
Append one-at-a-time:
Me : You
520 : 223
list.extend doubling:
18.5 : 22.2
Multiplication:
9.36 : 8.81
over-allocate then delete
9.42 : 6.93
--
Steven
More information about the Python-list
mailing list