data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I just finished debugging some code that broke after upgrading to Python 2.4 (from 2.3). Turns out the code was testing list iterators for their boolean value (to distinguish them from None). In 2.3, a list iterator (like any iterator) is always true. In 2.4, an exhausted list iterator is false; probably by virtue of having a __len__() method that returns the number of remaining items. I realize that this was a deliberate feature, and that it exists in 2.4 as well as in 2.4.1 and will in 2.4.2; yet, I'm not sure I *like* it. Was this breakage (which is not theoretical!) considered at all? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/0887d/0887d92e8620e0d2e36267115257e0acf53206d2" alt=""
On Tuesday 20 September 2005 17:49, Guido van Rossum wrote:
I realize that this was a deliberate feature, and that it exists in 2.4 as well as in 2.4.1 and will in 2.4.2; yet, I'm not sure I *like*
I wasn't paying any attention at the time, so I don't know what was discussed. Some discussion here just now leads me to believe that at least two of us here (including myself) think iterators shouldn't have length at all: they're *not* containers and shouldn't act that way. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
data:image/s3,"s3://crabby-images/0e44e/0e44e71002b121953844f91d449442aafa9cfd14" alt=""
[Guido]
It was not considered. AFAICT, 2.3 code assuming the Boolean value of an iterator being true was relying on an accidental implementation detail that may not also be true in Jython, PyPy, etc. Likewise, it is not universally true for arbitrary class based iterators which may have other methods including __nonzero__ or __len__. The Boolean value of an iterator is certainly not promised by the iterator protocol as specified in the docs or the PEP. The code, bool(it), is not really clear about its intent and seems a little weird to me. The reason it wasn't considered was that it wasn't on the radar screen as even a possible use case. On a tangential note, I think in 2.2 or 2.3, we found a number of bugs related to None testing. IIRC, the outcome of that conversation was a specific recommendation to NOT determine Noneness by Boolean tests. That recommendation ended-up making it into PEP 290: http://www.python.org/peps/pep-0290.html#testing-for-none [Fred]
think iterators shouldn't have length at all: they're *not* containers and shouldn't act that way.
Some iterators can usefully report their length with the invariant: len(it) == len(list(it)). There are some use cases for having the length when available. Also, there has been plenty of interest in being able to tell, when possible, if an iterator is empty without having to call it. AFAICT, the only downside was Guido's bool(it) situation. FWIW, the origin of the idea came from reading a comp-sci paper about ways to overcome the limitations of linking operations together using only iterators (the paper's terminology talked about map/fold operations). The issue was that decoupling benefits were partially offset by the loss of useful information about the input to an operation (i.e. the supplier may know and the consumer may want to know the input size, the input type, whether the elements are unique, whether the data is sorted, its provenance, etc.) Raymond
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On 9/20/05, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
That's too bad.
That's bullshit, and you know it -- you're just using this to justify that you didn't think of this. Whether an object is true or not is well-defined by the language and not by an accident of the implementation. Apart from None, all objects are always true unless they define either __nonzero__() or (in its absence) __len__(). The iterators for builtin sequences were carefully designed to have the minimal API required of iterators -- i.e., next() and __iter__() and nothing more.
And those are the *only* ones that affect the boolean value.
it was implied by not specifying a __nonzero__ or __len__.
The code, bool(it), is not really clear about its intent and seems a little weird to me.
Of course that's not what the broken code actually looked like. If was something like if ...: iter1 = iter(...) else: iter1 = None if ...: iter2 = iter(...) else: iter2 = None ... if iter1 and iter2: ... Where the arguments to the iter() functions were known to be lists.
Could you at least admit that this was an oversight and not try to pretend it was intentional breakage?
And I agree with that one in general (I was bitten by this in Zope once). But it bears a lot more weight when the type of the object is unknown or partially unknown. In my case, there was no possibility that the iter() argument was anything except a list, so the type of the iterator was fully known.
I still consider this an erroneous hypergeneralization of the concept of iterators. Iterators should be pure iterators and not also act as containers. Which other object type implements __len__ but not __getitem__?
Theer is plenty of interest in broken features all the time. IMO giving *some* iterators discoverable length (and other properties like reversability) but not all of them makes the iterator protocol more error-prone -- we're back to the situation where someone codes an algorithm for use with arbitrary iterators but only tests it with list and tuple iterators, and ends up breaking in the field. I know you can work around it, but that requires introspection which is not a great match for this kind of application.
Not every idea written up in a comp-sci paper is worth implementing (as acquisition in Zope 2 has amply proved). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On 9/21/05, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
Thanks; spoken like a man. I strongly feel that this needs to be corrected in 2.5. Iterators should have neither __len__ nor __nonzero__. I see mostly agreement that this is a misfeature. We don't really want to start writing code like this: while it: x = it.next() ...process x... when we can already write it like this: for x in it: ...process x... do we? Keeping a special API to allow a more efficient implementation of __reversed__ is fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Raymond Hettinger wrote:
But if the docs don't mention anything about true or false values for some particular type, one tends to assume that all values are true, as is the default for user-defined classes. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
"Guido van Rossum" <guido@python.org> wrote in message news:ca471dc20509201449652f11d@mail.gmail.com...
This seem unnecessarily indirect. Why not 'it != None' or now, 'it is not None'?
According to 2.4 News, dict iterators also got __len__ method, though I saw no mention of list.
I realize that this was a deliberate feature,
I presume there were two reasons: internal efficiency of preallocations (list(some_it) for example) and letting people differentiate iterator with something left to return versus nothing, just as we can differentiate collections with something versus nothing.
Searching the gmane archive with the link given on the python site for 'iterator len', I could not find any discussion of the __len__ method addition itself. Terry J. Reedy
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On 9/20/05, Terry Reedy <tjreedy@udel.edu> wrote:
I presume there were two reasons: internal efficiency of preallocations (list(some_it) for example)
This could have been implemented without making the implementation details public.
But this is against the very idea of the iterator protocol -- for general iterators, there may be no way to determine whether there is a next item without actually producing the item, so the proper approach is to code with that in mind. When this type of look-ahead is required, a buffering iterator should be inserted, so that the algorithm can work with all iterators rather than only with iterators over built-in containers. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Guido van Rossum wrote:
This seems like a misfeature to me. In fact I think an iterator having a len() method at all is a misfeature -- the concept doesn't make sense. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/0887d/0887d92e8620e0d2e36267115257e0acf53206d2" alt=""
On Tuesday 20 September 2005 17:49, Guido van Rossum wrote:
I realize that this was a deliberate feature, and that it exists in 2.4 as well as in 2.4.1 and will in 2.4.2; yet, I'm not sure I *like*
I wasn't paying any attention at the time, so I don't know what was discussed. Some discussion here just now leads me to believe that at least two of us here (including myself) think iterators shouldn't have length at all: they're *not* containers and shouldn't act that way. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
data:image/s3,"s3://crabby-images/0e44e/0e44e71002b121953844f91d449442aafa9cfd14" alt=""
[Guido]
It was not considered. AFAICT, 2.3 code assuming the Boolean value of an iterator being true was relying on an accidental implementation detail that may not also be true in Jython, PyPy, etc. Likewise, it is not universally true for arbitrary class based iterators which may have other methods including __nonzero__ or __len__. The Boolean value of an iterator is certainly not promised by the iterator protocol as specified in the docs or the PEP. The code, bool(it), is not really clear about its intent and seems a little weird to me. The reason it wasn't considered was that it wasn't on the radar screen as even a possible use case. On a tangential note, I think in 2.2 or 2.3, we found a number of bugs related to None testing. IIRC, the outcome of that conversation was a specific recommendation to NOT determine Noneness by Boolean tests. That recommendation ended-up making it into PEP 290: http://www.python.org/peps/pep-0290.html#testing-for-none [Fred]
think iterators shouldn't have length at all: they're *not* containers and shouldn't act that way.
Some iterators can usefully report their length with the invariant: len(it) == len(list(it)). There are some use cases for having the length when available. Also, there has been plenty of interest in being able to tell, when possible, if an iterator is empty without having to call it. AFAICT, the only downside was Guido's bool(it) situation. FWIW, the origin of the idea came from reading a comp-sci paper about ways to overcome the limitations of linking operations together using only iterators (the paper's terminology talked about map/fold operations). The issue was that decoupling benefits were partially offset by the loss of useful information about the input to an operation (i.e. the supplier may know and the consumer may want to know the input size, the input type, whether the elements are unique, whether the data is sorted, its provenance, etc.) Raymond
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On 9/20/05, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
That's too bad.
That's bullshit, and you know it -- you're just using this to justify that you didn't think of this. Whether an object is true or not is well-defined by the language and not by an accident of the implementation. Apart from None, all objects are always true unless they define either __nonzero__() or (in its absence) __len__(). The iterators for builtin sequences were carefully designed to have the minimal API required of iterators -- i.e., next() and __iter__() and nothing more.
And those are the *only* ones that affect the boolean value.
it was implied by not specifying a __nonzero__ or __len__.
The code, bool(it), is not really clear about its intent and seems a little weird to me.
Of course that's not what the broken code actually looked like. If was something like if ...: iter1 = iter(...) else: iter1 = None if ...: iter2 = iter(...) else: iter2 = None ... if iter1 and iter2: ... Where the arguments to the iter() functions were known to be lists.
Could you at least admit that this was an oversight and not try to pretend it was intentional breakage?
And I agree with that one in general (I was bitten by this in Zope once). But it bears a lot more weight when the type of the object is unknown or partially unknown. In my case, there was no possibility that the iter() argument was anything except a list, so the type of the iterator was fully known.
I still consider this an erroneous hypergeneralization of the concept of iterators. Iterators should be pure iterators and not also act as containers. Which other object type implements __len__ but not __getitem__?
Theer is plenty of interest in broken features all the time. IMO giving *some* iterators discoverable length (and other properties like reversability) but not all of them makes the iterator protocol more error-prone -- we're back to the situation where someone codes an algorithm for use with arbitrary iterators but only tests it with list and tuple iterators, and ends up breaking in the field. I know you can work around it, but that requires introspection which is not a great match for this kind of application.
Not every idea written up in a comp-sci paper is worth implementing (as acquisition in Zope 2 has amply proved). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On 9/21/05, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
Thanks; spoken like a man. I strongly feel that this needs to be corrected in 2.5. Iterators should have neither __len__ nor __nonzero__. I see mostly agreement that this is a misfeature. We don't really want to start writing code like this: while it: x = it.next() ...process x... when we can already write it like this: for x in it: ...process x... do we? Keeping a special API to allow a more efficient implementation of __reversed__ is fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Raymond Hettinger wrote:
But if the docs don't mention anything about true or false values for some particular type, one tends to assume that all values are true, as is the default for user-defined classes. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
"Guido van Rossum" <guido@python.org> wrote in message news:ca471dc20509201449652f11d@mail.gmail.com...
This seem unnecessarily indirect. Why not 'it != None' or now, 'it is not None'?
According to 2.4 News, dict iterators also got __len__ method, though I saw no mention of list.
I realize that this was a deliberate feature,
I presume there were two reasons: internal efficiency of preallocations (list(some_it) for example) and letting people differentiate iterator with something left to return versus nothing, just as we can differentiate collections with something versus nothing.
Searching the gmane archive with the link given on the python site for 'iterator len', I could not find any discussion of the __len__ method addition itself. Terry J. Reedy
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On 9/20/05, Terry Reedy <tjreedy@udel.edu> wrote:
I presume there were two reasons: internal efficiency of preallocations (list(some_it) for example)
This could have been implemented without making the implementation details public.
But this is against the very idea of the iterator protocol -- for general iterators, there may be no way to determine whether there is a next item without actually producing the item, so the proper approach is to code with that in mind. When this type of look-ahead is required, a buffering iterator should be inserted, so that the algorithm can work with all iterators rather than only with iterators over built-in containers. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Guido van Rossum wrote:
This seems like a misfeature to me. In fact I think an iterator having a len() method at all is a misfeature -- the concept doesn't make sense. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
participants (6)
-
Christos Georgiou
-
Fred L. Drake, Jr.
-
Greg Ewing
-
Guido van Rossum
-
Raymond Hettinger
-
Terry Reedy