It seems that the xrange object in the current CVS can't make up its mind whether it's an iterator or an iterable:
iterables = ["", (), [], {}, file('/dev/null'), xrange(10)] iterators = [iter(x) for x in iterables] for x in iterables + iterators: ... print hasattr(x, 'next'), x is iter(x), type(x) ... False False
False False False False False False False False True False True True True True True True True True True True True False
Generally, iterables don't have a next() method and return a new object each time they are iter()ed. Iterators do have a next() method and return themselves on iter(). xrange is a strange hybrid. In Python 2.2.0/1 xrange behaved just like the other iterables:
iterables = ["", (), [], {}, file('/dev/null'), xrange(10)] iterators = [iter(x) for x in iterables] for x in iterables + iterators: ... print hasattr(x, 'next'), x is iter(x), type(x) ... 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
What's the rationale behind this change? Oren
Xrange was given its own tp_iter slot and now runs as fast a range. In
single pass timings, it runs faster. In multiple passes, range is still
quicker because it only has to create the PyNumbers once.
Being immutable, xrange had the advantage that it could serve as its own
iterator and did not require the extra code needed for list iterators and
dict iterators.
Raymond Hettinger
----- Original Message -----
From: "Oren Tirosh"
It seems that the xrange object in the current CVS can't make up its mind whether it's an iterator or an iterable:
iterables = ["", (), [], {}, file('/dev/null'), xrange(10)] iterators = [iter(x) for x in iterables] for x in iterables + iterators: ... print hasattr(x, 'next'), x is iter(x), type(x) ... False False
False False False False False False False False True False True True True True True True True True True True True False Generally, iterables don't have a next() method and return a new object each time they are iter()ed. Iterators do have a next() method and return themselves on iter(). xrange is a strange hybrid.
In Python 2.2.0/1 xrange behaved just like the other iterables:
iterables = ["", (), [], {}, file('/dev/null'), xrange(10)] iterators = [iter(x) for x in iterables] for x in iterables + iterators: ... print hasattr(x, 'next'), x is iter(x), type(x) ... 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 What's the rationale behind this change?
Oren
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev
[Raymond Hettinger]
Xrange was given its own tp_iter slot and now runs as fast a range. In single pass timings, it runs faster. In multiple passes, range is still quicker because it only has to create the PyNumbers once.
Being immutable, xrange had the advantage that it could serve as its own iterator and did not require the extra code needed for list iterators and dict iterators.
Did you write the pach that Martin checked in? It's broken.
a = iter(xrange(10)) for i in a: print i if i == 4: print '*', a.next()
0 1 2 3 4 * 0 5 6 7 8 9
Compare to:
a = iter(range(10)) for i in a: print i if i == 4: print '*', a.next()
0 1 2 3 4 * 5 6 7 8 9
--Guido van Rossum (home page: http://www.python.org/~guido/)
RDH> Xrange was given its own tp_iter slot and now runs as fast a range. RDH> > In single pass timings, it runs faster. In multiple passes, range RDH> > is still quicker because it only has to create the PyNumbers once. RDH> > RDH> > Being immutable, xrange had the advantage that it could serve as its RDH> > own iterator and did not require the extra code needed for list RDH> > iterators and dict iterators.
a = iter(range(3)) for i in a: for j in a:
a = iter(xrange(3)) for i in a: for j in a:
GvR> Did you write the pach that Martin checked in? GvR> GvR> It's broken. GvR> GvR> >>> a = iter(xrange(10)) GvR> >>> for i in a: GvR> print i GvR> if i == 4: print '*', a.next() Okay, here's the distilled analysis: Given x=xrange(10), 1. Oren notes that id(iter(x)) == id(x) which is atypical of objects that have special iterator types or get wrapped by the generic iterobject. 2. GvR notes that id(iter(x)) != id(iter(iter(x))) which is inconsistent with range(). #1 should nor be a requirement. A call to iter should simply return something that has an iterable interface whether it be a new object or the current object. In examples of user defined classes with their own __iter__() method, we show the object returning itself. At the same time, we allow the __iter__ method to possibly be defined with a generator which returns a new object. In short, the object identity of iter(x) has not been promised to be either equal or not equal to x. If we decide that #1 is required (for consistency with the way other iterables are currently implemented), the most straightforward solution is to add an xrange iteratorobject to rangeobject.c just like we did for listobject.c. I'll be happy to do this if it is what everyone wants. For #2, the most compelling argument is that xrange should be a drop-in replacement for range in *every* circumstance including the weird use case of iter(iter(xrange(10))). This is easily accomplished and I've uploaded attached a simple patch to the bug report that restores this behavior. However, before accepting the patch, I think we ought to consider whether the current xrange() behavior is more rational than the range() behavior. PEP 234 says: """Some folks have requested the ability to restart an iterator. This should be dealt with by calling iter() on a sequence repeatedly, not by the iterator protocol itself. """ Maybe, the right way to go is to assure that iter(x) returns a freshly loaded iterator instead of the same iterator in the same state. Right now (with xrange different from range), we get what I think is weirder behavior from range(): print i,j 0 1 0 2 print i,j 0 0 0 1 0 2 1 0 1 1 1 2 2 0 2 1 2 2 BTW, I'm happy to do whatever you guys think best: (a) Adding an xrangeiteratorobject fixes #1 and #2 resulting in an xrange() identical to range() with no cost to performance during the loop (creation performance suffers just a bit). (b) Adding my other patch (attached to the bug report www.python.org/sf/564601), fixes #2 only (again with no cost to loop performance). (c) Leaving it the way it is gives xrange a behavior that is identical to range for the common use cases, and arguably superior abilities for the weird cases. Raymond Hettinger
Maybe, the right way to go is to assure that iter(x) returns a freshly loaded iterator instead of the same iterator in the same state.
That would be a change to the semantics of all iterators, not worth it just to fix a small oddity with xrange. I think it's fairly clear that xrange is to be thought of as a lazy list, *not* an iterator. The best way to fix it (if it needs fixing) is to have iter(xrange(...)) always return a new object, I think. It wouldn't be possible for all iterators to behave the way you suggest, anyway, because some kinds of iterator don't have an underlying sequence that can be restarted (e.g. file iterators). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Tue, Jun 04, 2002 at 04:57:58PM -0400, Raymond Hettinger wrote:
Being immutable, xrange had the advantage that it could serve as its own iterator and did not require the extra code needed for list iterators and dict iterators.
In its current form, xrange is no longer immutable. It has state information and calling the next() method of an xrange object modifies it. I guess the difference between us is that you are concerned with what works while I am irrationally obsessed with semantics :-) Oren
On Tue, Jun 04, 2002 at 04:08:08PM -0400, Oren Tirosh wrote:
It seems that the xrange object in the current CVS can't make up its mind whether it's an iterator or an iterable:
In 2.2, xrange had no "next" method, so it got wrapped by a generic iterator object. It was desirable for performance to have xrange also act as an iterator. See http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/rangeobject.c.diff?r1=2.35&r2=2.36 for the change. See http://www.python.org/sf/551410 for the sf patch this comes from. However, the following code would give different results if 'iter(x) is x' for xrange objects: x = xrange(5) for a in x: for b in x: print a,b (it'd print "0 1" "0 2" "0 3" "0 4" if they were the same iterator, just as for 'x = iter(range(5))') so, it's necessary to return a *different* xrange object from iter(x) so it can start iterating from the beginning again. I think there's an optimization that *the first time*, iter(x) is x for an xrange object. Hm, the python cvs I have here is too old to have this optimization ... so I can't really tell you how it works now for sure. Jeff
On Tue, Jun 04, 2002 at 04:08:08PM -0400, Oren Tirosh wrote:
It seems that the xrange object in the current CVS can't make up its mind whether it's an iterator or an iterable:
In 2.2, xrange had no "next" method, so it got wrapped by a generic iterator object. It was desirable for performance to have xrange also act as an iterator.
This seems to propagate the confusion. To avoid being wrapped by a generic iterator object, you need to define an __iter__ method, not a next method. The current xrange code (from SF patch #551410) uses the xrange object as both an iterator and iterable, and has an extra flag to make things work right when the same object is iterated over more than once. Without doing more of a review, I can only say that I'm a but uncomfortable with that approach. Something like the more recent code that Raymond H added to listobject.c to add a custom iterator makes more sense. But perhaps it is defensible. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum
The current xrange code (from SF patch #551410) uses the xrange object as both an iterator and iterable, and has an extra flag to make things work right when the same object is iterated over more than once. Without doing more of a review, I can only say that I'm a but uncomfortable with that approach. Something like the more recent code that Raymond H added to listobject.c to add a custom iterator makes more sense. But perhaps it is defensible.
The main defense is that the typical use case is for i in xrange(len(some_list)) In that case, it is desirable not to create an additional object, and nobody will notice the difference. Regards, Martin
martin@v.loewis.de (Martin v. Loewis):
The main defense is that the typical use case is
for i in xrange(len(some_list))
How about deprecating xrange, and introducing a new function such as indexes(sequence) that returns a proper iterator. That would clear up all the xrange confusion and make for nicer looking code as well. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
How about deprecating xrange,
Deprecating xrange has about as much chance as deprecating the string module. --Guido van Rossum (home page: http://www.python.org/~guido/)
Deprecating xrange has about as much chance as deprecating the string module.
Well, discouraging it then, or whatever *is* being done with the string module. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
The main defense is that the typical use case is
for i in xrange(len(some_list))
In that case, it is desirable not to create an additional object, and nobody will notice the difference.
Is it really so bad if this allocates *two* objects instead of one? I think that's the only to get my example to work correctly. And it *has* to work correctly. If two objects are created anyway, I agree with Oren that it's better to have a separate range-iterator object type. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum
Is it really so bad if this allocates *two* objects instead of one?
When accepting the patch, I assumed that the observed speed difference between xrange and range originated from the fact that xrange iteration allocates iterator objects. I'm not so sure anymore that this is the real cause, more likely, it is again the exception handling when exhausting the range.
I think that's the only to get my example to work correctly. And it *has* to work correctly.
If two objects are created anyway, I agree with Oren that it's better to have a separate range-iterator object type.
I agree. I wouldn't mind if somebody would review Raymond's to introduce such a thing. Regards, Martin
On Tue, Jun 04, 2002 at 04:01:24PM -0500, Jeff Epler wrote:
On Tue, Jun 04, 2002 at 04:08:08PM -0400, Oren Tirosh wrote:
It seems that the xrange object in the current CVS can't make up its mind whether it's an iterator or an iterable:
In 2.2, xrange had no "next" method, so it got wrapped by a generic iterator object. It was desirable for performance to have xrange also act as an iterator.
I understand the performance issue. But it is possible to improve the performance of iterating over xranges without creating this unholy chimera.
type([]), type(iter([])) (
, )
... lists have a listiterator
type({}), type(iter({})) (
, )
... dictionaries have a dictionary-iterator
type(xrange(10)), type(iter(xrange(10))) (
, )
... why shouldn't an xrange have an xrangeiterator? It's the only way to make xrange behave consistently with other iterables.
However, the following code would give different results if 'iter(x) is x' for xrange objects: x = xrange(5) for a in x: for b in x: print a,b
xrange currently is currently stuck halfway between an iterable and an iterator. If it was made 100% iterator you would be right, it would break this code. What I'm saying is that it should be 100% iterable. I know it works just fine the way it is. But I see a lot of confusion on the python list around the semantics of iterators and this behavior might make it just a little bit worse. Oren
Oren Tirosh
... why shouldn't an xrange have an xrangeiterator?
Because that would create an additional object.
It's the only way to make xrange behave consistently with other iterables.
Why does it have to be consistent?
I know it works just fine the way it is. But I see a lot of confusion on the python list around the semantics of iterators and this behavior might make it just a little bit worse.
Why do you think people will get confused? Most people will use it in the canoncical form for i in range(maxvalue) in which case they cannot experience any difference (except for the performance boost)? Regards, Martin
participants (6)
-
Greg Ewing
-
Guido van Rossum
-
Jeff Epler
-
martin@v.loewis.de
-
Oren Tirosh
-
Raymond Hettinger