sum for sequences?

Sun Mar 28 12:21:59 EDT 2010

On Mar 28, 7:57 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Sun, 28 Mar 2010 07:16:10 -0700, Steve Howell wrote:
> > The mildly surprising part of sum() is that is does add vs. add-in-
> > place, which leads to O(N) vs. O(1) for the inner loop calls, for
> > certain data structures, notably lists, even though none of the
> > intermediate results get used by the caller.  For lists, you could make
> > a more efficient variant of sum() that clones the start value and does
> > add-in-place.
>
> I have no doubt that you could make a version of sum for lists which is
> more efficient than the existing one. After all, join more or less does
> the same sort of thing, and it's very efficient. But don't think that add-
> in-place is necessarily cheap. List appends are amortized O(1) each; if
> you are adding M lists of N items each, that gives you O(M*N).
>

O(M*N) is still cheaper than O(M*N*N).

> It's possible to improve the performance a tad if you can make multiple
> appends in roughly constant time, which is what list.extend (probably?)
> does, but only up to a point. Lists are over-allocated, but if you try to
> add more items than there is room for, you need to make the list bigger,
> and that means reallocating memory, which could easily be O(N**2) or
> worse, depending on how good your OS's memory management is. Under Linux,
> at least by default, malloc will never fail, but there's no guarantee how
> long it will take to return. If the OOM killer has to start shutting down
> other applications, and paging more and more memory to disk, eventually
> malloc will return (or the system will core dump), but it could take a
> while...
>

Even though extend() obviously has to do memory allocations along the
way itself, it is still more efficient than the alternative.  No need
to speculate, you can measure these methods on your platform:

    M = 10
    N = 1000

    def in_place(
        start = [],
        sublists = ([[None] * M]) * N
        ):
        accum = start[:]
        for sublist in sublists:
            accum.extend(sublist)
        return accum

    def with_intermediates(
        start = [],
        sublists = ([[None] * M]) * N
        ):
        accum = start
        for sublist in sublists:
            accum = accum + sublist
        return accum