sum for sequences?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Mar 28 10:57:08 EDT 2010


On Sun, 28 Mar 2010 07:16:10 -0700, Steve Howell wrote:

> The mildly surprising part of sum() is that is does add vs. add-in-
> place, which leads to O(N) vs. O(1) for the inner loop calls, for
> certain data structures, notably lists, even though none of the
> intermediate results get used by the caller.  For lists, you could make
> a more efficient variant of sum() that clones the start value and does
> add-in-place.

I have no doubt that you could make a version of sum for lists which is 
more efficient than the existing one. After all, join more or less does 
the same sort of thing, and it's very efficient. But don't think that add-
in-place is necessarily cheap. List appends are amortized O(1) each; if 
you are adding M lists of N items each, that gives you O(M*N).

It's possible to improve the performance a tad if you can make multiple 
appends in roughly constant time, which is what list.extend (probably?) 
does, but only up to a point. Lists are over-allocated, but if you try to 
add more items than there is room for, you need to make the list bigger, 
and that means reallocating memory, which could easily be O(N**2) or 
worse, depending on how good your OS's memory management is. Under Linux, 
at least by default, malloc will never fail, but there's no guarantee how 
long it will take to return. If the OOM killer has to start shutting down 
other applications, and paging more and more memory to disk, eventually 
malloc will return (or the system will core dump), but it could take a 
while...



-- 
Steven



More information about the Python-list mailing list