restriction on sum: intentional bug?

Mark Dickinson dickinsm at gmail.com
Sat Oct 17 16:32:32 EDT 2009


On Oct 17, 3:27 pm, a... at pythoncraft.com (Aahz) wrote:
> In article <0062f568$0$26941$c3e8... at news.astraweb.com>,
> Steven D'Aprano  <st... at REMOVE-THIS-cybersource.com.au> wrote:
> > (For the record, summing lists is O(N**2), and unlike strings, there's no
> > optimization in CPython to avoid the slow behaviour.)
>
> Are you sure?

The O(N**2) claim surprised me too, but it certainly looks that
way.  Here's a script to produce some timings:

def concat1(list_of_lists):
    return sum(list_of_lists, [])

def concat2(list_of_lists):
    acc = []
    for l in list_of_lists:
        acc = acc + l
    return acc

def concat3(list_of_lists):
    acc = []
    for l in list_of_lists:
        acc += l
    return acc

def concat4(list_of_lists):
    acc = []
    for l in list_of_lists:
        acc.extend(l)
    return acc

test_list = [[i] for i in xrange(100000)]

from timeit import Timer

for fn in ["concat1", "concat2", "concat3", "concat4"]:
    t = Timer(fn + "(test_list)", "from __main__ import test_list, " +
fn)
    print fn, t.timeit(number=3)/3.0


On my machine (OS X 10.5/Core 2 Duo), under Python 2.7 svn I get:

newton:trunk dickinsm$ ./python.exe ~/time_list_sum.py
concat1 48.1459733645
concat2 48.4200883706
concat3 0.0146766503652
concat4 0.0184679826101


For some reason that I don't really understand, the CPython source
does
the equivalent of concat2 instead of concat3.  See the builtin_sum
function in

http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

and scroll past the special cases for ints and floats.  After a one-
line
source change, replacing the PyNumber_Add call with
PyNumber_InPlaceAdd,
I get the following results:

newton:trunk dickinsm$ ./python.exe ~/time_list_sum.py
concat1 0.0106019973755
concat2 48.0212899844
concat3 0.0138022899628
concat4 0.0179653167725

--
Mark



More information about the Python-list mailing list