Speed-up for loops
bartc at freeuk.com
Tue Sep 7 12:00:03 CEST 2010
"Steven D'Aprano" <steve-REMOVE-THIS at cybersource.com.au> wrote in message
news:4c85adfe$0$11115$c3e8da3 at news.astraweb.com...
> On Mon, 06 Sep 2010 11:38:22 +0100, BartC wrote:
>> Manually unrolling such a loop four times (ie. 4 copies of the body, and
>> counting only to 25 million) increased the speed by between 16% and 47%
>> (ie. runtime reducing by between 14% and 32%).
> Or you could *really* optimize it, as well as simplifying the code, by
> n = 100000000
> a = 10*n
> and doing a single multiplication rather than pointless repeated addition.
It's not pointless; it's being used as a test of long it takes to do a=a+10
a hundred million times in a loop.
(Similarly, if we wanted to investigate the speed of function calls, then a
'pointless' recursive implementation of fibonacci(36) that needed 76 million
(or whatever) calls is far more useful than one that did it with just 1. But
a lot of people just don't get this, for example some of the comments here:
> Of course, if 10 (or 100) is not a constant but is just standing in for
> something which varies each iteration:
> for i in xrange(100000000):
> a = a + f(i)
> then unrolling the loop is even less useful. The overhead of the loop
> itself is likely to be trivial compared to the cost of calling f() 100
> million times -- the added complexity to shave 3 seconds off a four
> minute calculation simply isn't worth it.
With Python 3 and def f(x): return x+1, unrolling this loop 4x improved
speed by 15%; 4.00 minutes reduces to 3.30 minutes.
In this code (adapted from a real piece of Python code):
for i in xrange(1000000):
Unrolling 4x produced a 47% speedup. Now your program only takes 2:45
>(BTW why doesn't Python 3 just accept 'xrange' as a synonym for 'range'?)
> Why should it? But if you want it, you can do it:
> xrange = range
> There, that wasn't hard, was it?
I think I just learned more about Python than from months of reading this
So 'range' is just a class like any other. And that a class is something you
can blithely copy from one variable to another. And whenever you see 'range'
anywhere, you can't always be certain that someone hasn't done:
range = 42
at some point. That explains a lot about the difficulties of implementing
Python efficiently. (And the xrange=range trick works well thanks.)
>) Integer arithmetic seems to go straight from 32-bits to long
>> integers; why not use 64-bits before needing long integers?
> Why? That adds 50% more code, 50% more testing, 50% more places for bugs
> to hide, 50% more effort required to maintain it, for something which *at
> best* will be a trivial optimization which at best is of interest to a
> small minority of Python coders.
I would have suggested just using 64-bits anyway (on 32-bit hardware) as,
currently, the slight extra overhead would be insignificant, provided it
doesn't result in double the memory when there are many integers.
As for being of minority interest, with the whole world rapidly becoming
64-bits, I would dispute that...
>> repeat 100000000: # for example
>> a = a + 10
> Because now both the parser and all Python coders need to care about one
> more reserved word, all so that one Python program in ten thousand can
> save 0.1 ms occasionally.
That's a minor detail. You can call it whatever you like.
More information about the Python-list