On the possibility of "optimizing" range() calls in for-loops

I've been thinking about the possibility of an optimization of range() calls in for-loops, replacing the list generation with a lazy range object.
xrange(), in Python 2.3, now has an iterator, so it should be faster than range() in for-loops, which is an incentive to use it more. However, this does go against the overall desire to move away from using xrange(), just as we have moved away from xreadlines(), etc.
So, I've been considering possible ways in which to 'optimize' the following types of loops:
for i in range( 1000 ): pass
or
n = 1000 for i in range( n ): pass
The main issue, as I see it, is substituting some form of lazy range iterator, for the range() function, "behind the curtains" as it were. People could gain the benefits of xrange() (less memory consumption, probably faster looping), without having to promote the continued use of xrange().
There would have to be logic to determine that the range function being used is the builtin range, and not a shadowed version. I assume this can be done in a fairly straightforward manner at compile time. I could use advice on whether this is feasible (but see below, first)
A further issue is the substitution itself. As of 2.3b1 the range() builtin is able to handle long integer objects larger than sys.maxint. xrange() is not so equipped, so it cannot be substituted blindly for range() without examining the range() arguments.
However, as the second example above shows, the value of the arguments would have to be determined at runtime. In principle, a function could be made which checked the arguments, and if they could be handled by an xrange() object, it could be returned. Otherwise, the results of range() could be returned. However, it is unclear whether the results of this indirection (and double checking of the arguments) would be a speed win (although the reduced memory requirements would still exist).
Guido has already stated his opinion that xrange should NOT be extended to handle values larger than sys.maxint (ie. he doesn't want to give xrange() any more features). A related option is to make a kind of private lazy range-iterator object, would could be unconditionally substituted for the range function, in for loops, as compile time. Based on my understanding of how the code generation works, this could be made to work in principle. However, without introducing some new byte codes, it doesn't seem possible to have a hidden, "private" iterator generator function.
So, without going into much further detail it seems there would be only three feasible options.
1) Extend xrange() to handle long int arguments, to bring parity with range(), and allowing unconditional substitution with range, in for-loops.
2) Change byte code generation to allow substituting a private range iterator (or range() iterator generator) that can handle all possible ranges.
3) Forget it. Not worth bothering about.
After thinking about it a lot, it seems number 3 has become the most attractive (and likely) option. :) Number 2 seems like a maintenance nightmare, and number 1 would require the BDFL and community to change the stance on extending the feature set of xrange().
I admit I haven't done extensive testing to determine whether this optimization would really provide much benefit on the whole. Certainly a grep through any large Python code base (say the standard libraries) indicates that there are many uses of range() in for loops, but typically for fairly small ranges, so the memory issue may not be such a problem. Also, should there ever be a Python 3, where range() itself probably becomes lazy, the porting headaches, for this particular issue, should be minimal compared to others. (ie. xrange() would become a synonym, of sorts, for the new range())
Anyway, I only bring it up because it is an issue that does get discussed from time to time, usually with the idea that it "could" possibly be done in the future. I think if it should ever be done, it should probably be done soon, just so that using xrange() will grow less attractive, rather than more. I'm interested to hear other thoughts about this.

On Saturday 14 June 2003 06:09 am, Chad Netzer wrote:
I've been thinking about the possibility of an optimization of range() calls in for-loops, replacing the list generation with a lazy range object.
xrange(), in Python 2.3, now has an iterator, so it should be faster than range() in for-loops, which is an incentive to use it more.
Measuring is better than wondering:
with Python's current CVS:
[alex@lancelot Lib]$ python -O timeit.py 'for x in range(2000): pass' 1000 loops, best of 3: 257 usec per loop [alex@lancelot Lib]$ python -O timeit.py 'for x in xrange(2000): pass' 1000 loops, best of 3: 220 usec per loop
with 2.2.2:
[alex@lancelot Lib]$ python2.2 -O timeit.py 'for x in range(2000): pass' 1000 loops, best of 3: 315 usec per loop [alex@lancelot Lib]$ python2.2 -O timeit.py 'for x in xrange(2000): pass' 1000 loops, best of 3: 440 usec per loop
So, yes: xrange has accelerated SO much (by a factor of 2) as to become measurably faster than range (by a little, about 15%) for this specific usage, while it used to be substantially slower.
However, these days, if you're really in a hurry to repeat some action 2000 times (and don't need the iteration index, which is a reasonably frequent case), there IS an alternative...:
alex@lancelot Lib]$ python -O timeit.py -s'import itertools' 'for x in itertools.repeat(None,2000): pass' 10000 loops, best of 3: 184 usec per loop [alex@lancelot Lib]$ python -O timeit.py -s'import itertools' 'for x in itertools.repeat(None,2000): pass' 1000 loops, best of 3: 183 usec per loop
As you see, this is as much faster than xrange, as xrange is faster than range.
Alex

Chad Netzer cnetzer@sonic.net writes:
There would have to be logic to determine that the range function being used is the builtin range, and not a shadowed version. I assume this can be done in a fairly straightforward manner at compile time. I could use advice on whether this is feasible (but see below, first)
It is not. Shadowing may occur well after the module import, by inserting a name in the globals. If it were feasible, it would have been done long ago.
Regards, Martin

On Sat, 2003-06-14 at 02:56, Martin v. Löwis wrote:
It is not. Shadowing may occur well after the module import, by inserting a name in the globals. If it were feasible, it would have been done long ago.
Somehow, when I dreamed all this up, I convinced myself that the byte-compiler could know whether the range() in globals was the builtin, and only do the optimization if it was.
Now, I can't even recall why I thought that. :)
Oh well. Thanks.
Chad

[Chad Netzer]
The main issue, as I see it, is substituting some form of lazy range iterator, for the range() function, "behind the curtains" as it were. People could gain the benefits of xrange() (less memory consumption, probably faster looping), without having to promote the continued use of xrange().
The SF patch manager has a workable implementation of your idea:
www.python.org/sf/738094 for i in range(N) optimization
The jury is still out on whether it is an elegant, brilliant patch or a horrendous hack.
Raymond Hettinger

The SF patch manager has a workable implementation of your idea:
www.python.org/sf/738094 for i in range(N) optimization
The jury is still out on whether it is an elegant, brilliant patch or a horrendous hack.
And why couldn't it be both? :-)
--Guido van Rossum (home page: http://www.python.org/~guido/)

On Sun, 2003-06-15 at 20:59, Raymond Hettinger wrote:
The SF patch manager has a workable implementation of your idea:
www.python.org/sf/738094 for i in range(N) optimization
The jury is still out on whether it is an elegant, brilliant patch or a horrendous hack.
Thanks for pointing out this patch to me. It was a bit of a revelation, since I had been looking at this issue from more of a byte code generation and special casing standpoint, whereas this approach peeks ahead right inside the range() builtin C implementation. That makes it quite straightforward, doesn't change the bytecode generation at all, and nicely gets around all the issues I brought up in my original post (ie. overriding the range() builtin, and handling python 2.3 PyLong valued ranges (by not optimizing in those cases)).
Of course, I can see why it might cause some people to cringe, just because it is snooping ahead in the frame. But, it doesn't seem that the value of 3 is likely to change often, so it probably won't add much of a maintainance burden.
Hopefully this patch can be discussed further once 2.4 starts to be talked about. I think I'll try to gather some statistics on how often this optimization would be employed in existing code bases, and thus, how much of a speed/memory gain it is likely to give on average. My intuition is that the small overhead it introduces will be more than made up for by the speed and memory use improvements it can bring.

www.python.org/sf/738094 for i in range(N) optimization
The jury is still out on whether it is an elegant, brilliant patch or a horrendous hack.
Isn't this flawed in the case where it's called by a C function rather than directly from the bytecode VM? If the C function was called in the spot marked X below:
for ... in X(...): ...
(where X is the C function that calls range(), probably through some PyObject_Call() variation), then the check in the patch would trigger incorrectly.
--Guido van Rossum (home page: http://www.python.org/~guido/)
participants (5)
-
Alex Martelli
-
Chad Netzer
-
Guido van Rossum
-
martin@v.loewis.de
-
Raymond Hettinger