Kindly show me a better way to do it

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun May 9 02:17:07 EDT 2010


On Sat, 08 May 2010 14:06:33 -0700, Oltmans wrote:

> On May 9, 1:53 am, superpollo <ute... at esempio.net> wrote:
> 
>> add = lambda a,b: a+b
>> for i in reduce(add,a):
>>      print i
> 
> This is very neat. Thank you. Sounds like magic to me. Can you please
> explain how does that work? Many thanks again.

Don't use this except for small lists, it is very inefficient and will be 
slow for large lists. It is a Shlemiel The Painter algorithm:

http://www.joelonsoftware.com/articles/fog0000000319.html


The most idiomatic solution is a simple, straightforward nested iteration:

for sublist in a:
    for item in sublist:
        do_something_with(item)

Say that there are 10 sublists with 10 items each. Then nested iteration 
will iterate 100 times in total. The solution with reduce will iterate:

10+10  # add the first sublist and the second sublist
20+10  # add the third sublist
30+10  # add the fourth sublist
40+10  # and so on... 
50+10
60+10
70+10
80+10
90+10  # add the last sublist
100  # and now iterate over the combined list

or 640 times in total.  If there are 100 sublists of 10 items each, the 
performance is even worse: 51,490 for the reduce solution, versus 1000 
for the nested iteration.

Admittedly those iterations will be in fast C code instead of slow Python 
code, which is why you might not notice the difference at first, but 
you're still doing a lot of unnecessary work which takes time. How much 
time? Python makes it easy to find out.

>>> from timeit import Timer
>>> setup = "data = [range(10) for i in range(10)]"
>>> t1 = Timer("""for sublist in data:
...     for item in sublist:
...             pass""", setup)
>>> t2 = Timer("""for item in reduce(lambda x,y: x+y, data):
...     pass""", setup)
>>>
>>> min(t1.repeat(number=100000))
0.94107985496520996
>>> min(t2.repeat(number=100000))
1.7509880065917969

So for ten sublists of ten items each, the solution using reduce is 
nearly twice as slow as the nested iteration. If we make the number of 
lists ten times larger, the nested for-loop solution takes ten times 
longer, as you would expect:

>>> setup = "data = [range(10) for i in range(100)]"
>>> t1 = Timer("""for sublist in data:
...     for item in sublist:
...             pass""", setup)
>>> min(t1.repeat(number=100000))
10.349304914474487


But the reduce solution slows down by a factor of thirty-two rather than 
ten:

>>> t2 = Timer("""for item in reduce(lambda x,y: x+y, data):
...     pass""", setup)
>>> min(t2.repeat(number=100000))
58.116463184356689


If we were to increase the number of sublists further, the reduce 
solution will perform even more badly.



-- 
Steven



More information about the Python-list mailing list