Speed-up for loops
steve-REMOVE-THIS at cybersource.com.au
Tue Sep 7 05:14:06 CEST 2010
On Mon, 06 Sep 2010 11:38:22 +0100, BartC wrote:
> Modifying the OP's code a little:
> a = 0
> for i in xrange(100000000): # 100 million
> a = a + 10 # add 10 or 100
> print a
> Manually unrolling such a loop four times (ie. 4 copies of the body, and
> counting only to 25 million) increased the speed by between 16% and 47%
> (ie. runtime reducing by between 14% and 32%).
Or you could *really* optimize it, as well as simplifying the code, by
n = 100000000
a = 10*n
and doing a single multiplication rather than pointless repeated addition.
(I assume that the number of loops is meant to be a variable rather than
a constant, otherwise a = 1000000000 would be the correct optimization.)
Of course, if 10 (or 100) is not a constant but is just standing in for
something which varies each iteration:
for i in xrange(100000000):
a = a + f(i)
then unrolling the loop is even less useful. The overhead of the loop
itself is likely to be trivial compared to the cost of calling f() 100
million times -- the added complexity to shave 3 seconds off a four
minute calculation simply isn't worth it.
Besides, loop unrolling really only is valuable for small loops,
otherwise the overhead caused by the increased code size makes it a
pessimation rather than an optimization. It's very easy for any gains to
be lost due to increased cache misses, time needed to copy the code into
There's a lot of subtlety in optimization, and what counts as an
optimization for low-level operations, and what is an optimization for
high-level languages like Python, are rarely the same.
> This depended on whether I added +10 or +100 (ie. whether long integers
> are needed), whether it was inside or outside a function, and whether I
> was running Python 2 or 3 (BTW why doesn't Python 3 just accept 'xrange'
> as a synonym for 'range'?)
Why should it? But if you want it, you can do it:
xrange = range
There, that wasn't hard, was it?
> These are just some simple tests on my particular machine and
> implementations, but they bring up some points:
> (1) Loop unrolling does seem to have a benefit, when the loop body is
> (2) Integer arithmetic seems to go straight from 32-bits to long
> integers; why not use 64-bits before needing long integers?
Why? That adds 50% more code, 50% more testing, 50% more places for bugs
to hide, 50% more effort required to maintain it, for something which *at
best* will be a trivial optimization which at best is of interest to a
small minority of Python coders.
There's already code to deal with 32 bit its, code to deal with longints,
and code to deal with shifting transparently from one to the other.
Adding code to deal with 64 bit ints doesn't happen for free.
Besides, if you care about the difference between 32 and 64 bit ints,
chances are you don't want Python blithely swapping from one to the other
when you least expect it. So you'll be using a library that gives you
access to whichever ints you want, probably implemented natively rather
than as objects.
Sounds rather like numpy really :)
> (3) Since the loop variable is never used, why not have a special loop
> statement that repeats code so many times? This can be very fast, since
> the loop counter need not be a Python object, and probably there would
> be no need for unrolling at all:
> repeat 100000000: # for example
> a = a + 10
Because now both the parser and all Python coders need to care about one
more reserved word, all so that one Python program in ten thousand can
save 0.1 ms occasionally.
Optimizations don't happen for free. If the optimization doesn't carry
it's own weight, it's a pessimation. To me, it's more important to be
able to be able to use repeat as a name:
than to save a millisecond or so.
Besides, if repeat were to become syntax, then I'd MUCH rather have it
used for repeat...until (or repeat...while) loops, to avoid the anti-
x = f()
while not condition(x):
x = f(x)
which would be better as:
x = f()
More information about the Python-list