I've been thinking about the possibility of an optimization of range() calls in for-loops, replacing the list generation with a lazy range object.
xrange(), in Python 2.3, now has an iterator, so it should be faster than range() in for-loops, which is an incentive to use it more. However, this does go against the overall desire to move away from using xrange(), just as we have moved away from xreadlines(), etc.
So, I've been considering possible ways in which to 'optimize' the following types of loops:
for i in range( 1000 ): pass
or
n = 1000 for i in range( n ): pass
The main issue, as I see it, is substituting some form of lazy range iterator, for the range() function, "behind the curtains" as it were. People could gain the benefits of xrange() (less memory consumption, probably faster looping), without having to promote the continued use of xrange().
There would have to be logic to determine that the range function being used is the builtin range, and not a shadowed version. I assume this can be done in a fairly straightforward manner at compile time. I could use advice on whether this is feasible (but see below, first)
A further issue is the substitution itself. As of 2.3b1 the range() builtin is able to handle long integer objects larger than sys.maxint. xrange() is not so equipped, so it cannot be substituted blindly for range() without examining the range() arguments.
However, as the second example above shows, the value of the arguments would have to be determined at runtime. In principle, a function could be made which checked the arguments, and if they could be handled by an xrange() object, it could be returned. Otherwise, the results of range() could be returned. However, it is unclear whether the results of this indirection (and double checking of the arguments) would be a speed win (although the reduced memory requirements would still exist).
Guido has already stated his opinion that xrange should NOT be extended to handle values larger than sys.maxint (ie. he doesn't want to give xrange() any more features). A related option is to make a kind of private lazy range-iterator object, would could be unconditionally substituted for the range function, in for loops, as compile time. Based on my understanding of how the code generation works, this could be made to work in principle. However, without introducing some new byte codes, it doesn't seem possible to have a hidden, "private" iterator generator function.
So, without going into much further detail it seems there would be only three feasible options.
1) Extend xrange() to handle long int arguments, to bring parity with range(), and allowing unconditional substitution with range, in for-loops.
2) Change byte code generation to allow substituting a private range iterator (or range() iterator generator) that can handle all possible ranges.
3) Forget it. Not worth bothering about.
After thinking about it a lot, it seems number 3 has become the most attractive (and likely) option. :) Number 2 seems like a maintenance nightmare, and number 1 would require the BDFL and community to change the stance on extending the feature set of xrange().
I admit I haven't done extensive testing to determine whether this optimization would really provide much benefit on the whole. Certainly a grep through any large Python code base (say the standard libraries) indicates that there are many uses of range() in for loops, but typically for fairly small ranges, so the memory issue may not be such a problem. Also, should there ever be a Python 3, where range() itself probably becomes lazy, the porting headaches, for this particular issue, should be minimal compared to others. (ie. xrange() would become a synonym, of sorts, for the new range())
Anyway, I only bring it up because it is an issue that does get discussed from time to time, usually with the idea that it "could" possibly be done in the future. I think if it should ever be done, it should probably be done soon, just so that using xrange() will grow less attractive, rather than more. I'm interested to hear other thoughts about this.