efficiency of range() and xrange() in for loops

Steven D'Aprano steve at REMOVETHIScyber.com.au
Wed Apr 5 19:08:45 EDT 2006


On Wed, 05 Apr 2006 15:02:33 -0700, Steve R. Hastings wrote:

> On Wed, 05 Apr 2006 22:24:24 +0200, Fredrik Lundh wrote:
>>> If Python actually allocates the list, then clearly we should all use
>>> "for i in xrange".
>> 
>> clearly we should all?  speak for yourself.
> 
> I apologize for my pithy turn of phrase.  Clearly what I should have
> written was:
> 
> "If Python actually allocates the list, then clearly 'for i in xrange'
> would be desirable in cases where memory allocation might possibly be a
> problem; of course if you have lots of memory, or you are only looping a
> small number of times, go ahead and use range() because why not."

How about this?

...because range() is simpler and actually an English word, it reads
better and won't confuse non-native English speakers who try looking
xrange up in a dictionary, it is aesthetically nicer, and at least some of
the time range() will actually be faster and maybe even consume less
memory (although if your list is that small, we're only talking about
trivial amounts of memory anyway). 

But yes, for many purposes, the difference between xrange and range is
quite minimal. But please try to recall that many != all, and any
optimization put to Python 2.x must work in all cases, not just many.


> I deeply apologize for inadvertently trying to tell you how to write a
> for loop.  I hope the offense isn't unforgivable.

Sarcasm aside, you will find many many people are deeply allergic to the
merest hint of premature optimization, especially when that premature
optimization is posed in such a way to hint that the person proposing it
hasn't really considered all the factors.

[snip]

> Actually, for many uses of "for i in (range|xrange)", you only need the
> value of i, and you aren't doing anything with the integer object.  You
> might not even be looking at the value of i:
> 
> 
> start_time = time.time()
> for i in xrange(10**6):
>     run_some_function()
> stop_time = time.time()
> secs = (stop_time - start_time) / 10**6 print "run_some_function() took
> on average, %f seconds." % secs
> 
> 
> In the above, clearly we should all use xrange()... oops, I meant, if
> you want to, you could use xrange() instead of allocating a list of a
> million integers and then doing nothing with the list itself and then
> deallocating the list of a million integers.

Yes, the above example is a good use case for xrange. Did you think that
anyone denied that there were good cases for it?

Here is a bad usage case for xrange:

L = []
for i in xrange(10**6):
    L.append(2*i+1)

This should likely be written as:

L = range(1, 2*10**6+1, 2)


I'll tell you what: I'll agree that the existence of bad usages for xrange
does NOT mean xrange should never be used, if you agree that the existence
of bad usages for range does NOT mean range should never be used.

In any case, if you look at the source code for the timeit module, you
will see that for cases where you genuinely don't need the values of i,
the optimal strategy is to not bother creating one million integers at all:

L = [None]*10**6 # don't time the creation of the list
start_time = time.time()
for _ in L:
    run_some_function()
stop_time = time.time()

This strategy (I think) trades off a little extra memory for a little
extra speed; in other applications, speed may be less important and memory
more so, so xrange or some other iterator would be preferred.


>>> If Python doesn't currently optimize this case, is there any chance
>>> this optimization could be added?
>> 
>> in Python 2.X, range is defined to return a list.  if you start
>> returning something else, you'll break stuff.
> 
> Perhaps I'm mistaken here, but I don't see how this optimization could
> possibly break anything.  range() makes a list, and for consumes it, and
> the list isn't seen anywhere else.

But of course you are forgetting that range is just a function and can be
used anywhere, not just in for loops. Here is a toy example:

L = range(1000)
random.shuffle(L)
while L:
    L.pop()

If range was optimized to return a xrange or iterator object, this code
would stop working.


-- 
Steven.




More information about the Python-list mailing list