Speed-up for loops

Wed Sep 8 14:45:26 EDT 2010

"Steven D'Aprano" <steve at REMOVE-THIS-cybersource.com.au> wrote in message
news:4c878be5$0$11113$c3e8da3 at news.astraweb.com...
> On Tue, 07 Sep 2010 11:00:03 +0100, BartC wrote:
>
>>> for i in xrange(100000000):
>>>    a = a + f(i)

>> With Python 3 and def f(x): return x+1, unrolling this loop 4x improved
>> speed by 15%; 4.00 minutes reduces to 3.30 minutes.
>
>
> I'm afraid that I can't replicate those figures. In my test, unrolling
> the loop causes a massive SLOWDOWN of 37%, not a speed up. Here is my
> test code:

You're absolutely right. I completely forgot about modulating the i index
for each duplicated line.

> def test_unrolled(n):
>    a = 0
>    for i in range(n//4):
>        a += f(4*i)
>        a += f(4*i+1)
>        a += f(4*i+2)
>        a += f(4*i+3)
>    return a

Although tidying up your calculations (as already posted) gives code that is
neither faster nor slower.

I'd hoped that using the following would help, but did nothing in Python 3,
and gave only 8-10% speedup in Python 2:

for i in xrange(0,n,4):
        a=a+f(i)
        a=a+f(i+1)
        a=a+f(i+2)
        a=a+f(i+3)

(On my other example of setting list elements to 0, this did improve speed
by some 10% in Python 3, and 28% in '2'.)

So using manual unrolling for an indexed loop is not going to be the right
approach (it's also fiddly, and there is the problem of N not being a 
multiple of 4 or whatever).

We're trying to avoid the iteration overhead, yet we're adding it in the
code anyway, and as user-level code. But, I still think that internal
support for such a thing might be worthwhile, when it can make certain
assumptions about the loop range and index type.

-- 
Bartc