[Tutor] Efficiency of while versus (x)range

Fri Mar 18 06:43:57 CET 2011

Shane O'Connor wrote:
> Hi,
> 
> First-time poster here. I've a question about loop efficiency - I was
> wondering whether this code:
> 
> i = 0
> while i < 1000:
>     do something
>     i+=1
> 
> is more efficient than:
> 
> for i in range(1000):
>     do something
> 
> or:
> 
> for i in xrange(1000):
>     do something

You can test this with the timeit module. xrange is about 50% faster 
than range for a million items, due to the overhead of building the full 
list:

[steve at sylar ~]$ python -m timeit "for n in range(1000000): pass"
10 loops, best of 3: 156 msec per loop
[steve at sylar ~]$ python -m timeit "for n in xrange(1000000): pass"
10 loops, best of 3: 106 msec per loop

(The "10 loops" part means that timeit repeats the entire code snippet 
ten times, in order to minimize the effect of other processes and 
background tasks.)

However, if you move building the list into setup code, which doesn't 
get timed, you will see that there's hardly any difference in time for 
iterating over a list and an xrange object:

[steve at sylar ~]$ python -m timeit -s "items = range(1000000)" "for n in 
items: pass"
10 loops, best of 3: 96.1 msec per loop
[steve at sylar ~]$ python -m timeit -s "items = xrange(1000000)" "for n in 
items: pass"
10 loops, best of 3: 99.3 msec per loop

As for the while loop, it doesn't even come close!

[steve at sylar ~]$ python -m timeit "n = 0
 > while n < 1000000: n += 1"
10 loops, best of 3: 282 msec per loop

The problem is that in a while loop, the looping itself is handled in 
slow Python code, while in the for loop, it's handled by fast C code (or 
whatever language the Python virtual machine is written in).

We can see this by disassembling the code into virtual machine instructions:

[steve at sylar ~]$ python
Python 2.5 (r25:51908, Nov  6 2007, 16:54:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>>
 >>> from dis import dis
 >>> dis(compile("for x in xrange(1000000): pass", "", "exec"))
   1           0 SETUP_LOOP              20 (to 23)
               3 LOAD_NAME                0 (xrange)
               6 LOAD_CONST               0 (1000000)
               9 CALL_FUNCTION            1
              12 GET_ITER
         >>   13 FOR_ITER                 6 (to 22)
              16 STORE_NAME               1 (x)
              19 JUMP_ABSOLUTE           13
         >>   22 POP_BLOCK
         >>   23 LOAD_CONST               1 (None)
              26 RETURN_VALUE

The for-loop is three virtual instructions: FOR_ITER, STORE_NAME and 
JUMP_ABSOLUTE. Compare that to the while loop:

 >>> dis(compile("n = 0\nwhile n < 1000000: n += 1", "", "exec"))
   1           0 LOAD_CONST               0 (0)
               3 STORE_NAME               0 (n)

   2           6 SETUP_LOOP              28 (to 37)
         >>    9 LOAD_NAME                0 (n)
              12 LOAD_CONST               1 (1000000)
              15 COMPARE_OP               0 (<)
              18 JUMP_IF_FALSE           14 (to 35)
              21 POP_TOP
              22 LOAD_NAME                0 (n)
              25 LOAD_CONST               2 (1)
              28 INPLACE_ADD
              29 STORE_NAME               0 (n)
              32 JUMP_ABSOLUTE            9
         >>   35 POP_TOP
              36 POP_BLOCK
         >>   37 LOAD_CONST               3 (None)
              40 RETURN_VALUE

So the for loop pushes more of the actual effort into the underlying 
engine, which is faster.

Of course, in real-life code, most of this is just theoretical. I see 
from your comment below that you understand this, but for the benefit of 
others reading, most of the time, the overhead of the loop body will be 
much bigger than the loop itself. Who cares whether a loop takes 30 
seconds + 100 milliseconds, or 30 seconds + 150 milliseconds? You aren't 
going to notice the difference.

The general advice is best summed up with a quote from the famous 
computer scientist Donald Knuth:

"We should forget about small efficiencies, say about 97% of the time: 
premature optimization is the root of all evil. Yet we should not pass 
up our opportunities in that critical 3%.A good programmer will not be 
lulled into complacency by such reasoning, he will be wise to look 
carefully at the critical code; but only after that code has been 
identified."

http://en.wikipedia.org/wiki/Program_optimization

Other quotes of note:

"More computing sins are committed in the name of efficiency (without 
necessarily achieving it) than for any other single reason - including 
blind stupidity." - W.A. Wulf

"The First Rule of Program Optimization: Don't do it. The Second Rule of 
Program Optimization (for experts only!): Don't do it yet." - Michael A. 
Jackson

> In my mind, the while loop should not allocate as much memory as range or
> have the overhead of the iterator of xrange (although aesthetically, I
> prefer the x/range style). Is this the case or does the compiler do
> something clever here?

As a general rule, Python's compiler rarely does anything clever. 
"Clever" optimizations are remarkable for the number of difficult to 
notice and even more difficult to fix bugs they introduce.

Nevertheless, the PyPy project is experimenting with optimizing 
just-in-time compilers, and so far are already running about twice as 
fast as the standard CPython compiler. Similarly, the experimental 
WPython compiler also does a lot more optimizations than CPython.

If you're interested in optimization, PyPy is good to look at: for a 
handful of carefully selected examples, Python code in PyPy is faster 
than optimized C code!

http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-carefully-crafted.html

> In particular, I'm using Python 2.4.3 on a web server which needs to run as
> fast as possible using as little memory as possible (no surprises there!).
> I'm aware that there are more significant optimizations than the above and I
> will profile the code rather than prematurely optimize loops at the sake of
> readability/errors but I'm still curious about the answer.

-- 
Steven