[Python-Dev] xrange identity crisis

Wed, 5 Jun 2002 00:52:16 -0400

RDH> Xrange was given its own tp_iter slot and now runs as fast a range.
RDH> > In single pass timings, it runs faster.  In multiple passes, range
RDH> > is still quicker because it only has to create the PyNumbers once.
RDH> >
RDH> > Being immutable, xrange had the advantage that it could serve as its
RDH> > own iterator and did not require the extra code needed for list
RDH> > iterators and dict iterators.
>
GvR> Did you write the pach that Martin checked in?
GvR>
GvR> It's broken.
GvR>
GvR> >>> a = iter(xrange(10))
GvR> >>> for i in a:
GvR>         print i
GvR>         if i == 4: print '*', a.next()

Okay, here's the distilled analysis:

Given x=xrange(10),
1. Oren notes that id(iter(x)) == id(x) which is atypical of objects that
have special iterator types or get wrapped by the generic iterobject.
2. GvR notes that id(iter(x)) != id(iter(iter(x))) which is inconsistent
with range().

#1 should nor be a requirement.  A call to iter should simply return
something that has an iterable interface whether it be a new object or the
current object.  In examples of user defined classes with their own
__iter__() method, we show the object returning itself.  At the same time,
we allow the __iter__ method to possibly be defined with a generator which
returns a new object.  In short, the object identity of iter(x) has not been
promised to be either equal or not equal to x.

If we decide that #1 is required (for consistency with the way other
iterables are currently implemented), the most straightforward solution is
to add an xrange iteratorobject to rangeobject.c just like we did for
listobject.c.  I'll be happy to do this if it is what everyone wants.

For #2, the most compelling argument is that xrange should be a drop-in
replacement for range in *every* circumstance including the weird use case
of iter(iter(xrange(10))).  This is easily accomplished and I've uploaded
attached a simple patch to the bug report that restores this behavior.
However, before accepting the patch, I think we ought to consider whether
the current xrange() behavior is more rational than the range() behavior.

PEP 234 says:  """Some folks have requested the ability to restart an
iterator.  This should be dealt with by calling iter() on a sequence
repeatedly, not by the iterator protocol itself. """

Maybe, the right way to go is to assure that iter(x) returns a freshly
loaded iterator instead of the same iterator in the same state.  Right now
(with xrange different from range), we get what I think is weirder behavior
from range():

>>> a = iter(range(3))
>>> for i in a:
 for j in a:
  print i,j

0 1
0 2
>>> a = iter(xrange(3))
>>> for i in a:
 for j in a:
  print i,j

0 0
0 1
0 2
1 0
1 1
1 2
2 0
2 1
2 2

BTW, I'm happy to do whatever you guys think best:
(a)  Adding an xrangeiteratorobject fixes #1 and #2 resulting in an xrange()
identical to range() with no cost to performance during the loop (creation
performance suffers just a bit).
(b)  Adding my other patch (attached to the bug report
www.python.org/sf/564601), fixes #2 only (again with no cost to loop
performance).
(c)  Leaving it the way it is gives xrange a behavior that is identical to
range for the common use cases, and arguably superior abilities for the
weird cases.

Raymond Hettinger