
On 9/20/05, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
[Guido]
I just finished debugging some code that broke after upgrading to Python 2.4 (from 2.3). Turns out the code was testing list iterators for their boolean value (to distinguish them from None). In 2.3, a list iterator (like any iterator) is always true. In 2.4, an exhausted list iterator is false; probably by virtue of having a __len__() method that returns the number of remaining items.
I realize that this was a deliberate feature, and that it exists in 2.4 as well as in 2.4.1 and will in 2.4.2; yet, I'm not sure I *like* it. Was this breakage (which is not theoretical!) considered at all?
It was not considered.
That's too bad.
AFAICT, 2.3 code assuming the Boolean value of an iterator being true was relying on an accidental implementation detail that may not also be true in Jython, PyPy, etc.
That's bullshit, and you know it -- you're just using this to justify that you didn't think of this. Whether an object is true or not is well-defined by the language and not by an accident of the implementation. Apart from None, all objects are always true unless they define either __nonzero__() or (in its absence) __len__(). The iterators for builtin sequences were carefully designed to have the minimal API required of iterators -- i.e., next() and __iter__() and nothing more.
Likewise, it is not universally true for arbitrary class based iterators which may have other methods including __nonzero__ or __len__.
And those are the *only* ones that affect the boolean value.
The Boolean value of an iterator is certainly not promised by the iterator protocol as specified in the docs or the PEP.
it was implied by not specifying a __nonzero__ or __len__.
The code, bool(it), is not really clear about its intent and seems a little weird to me.
Of course that's not what the broken code actually looked like. If was something like if ...: iter1 = iter(...) else: iter1 = None if ...: iter2 = iter(...) else: iter2 = None ... if iter1 and iter2: ... Where the arguments to the iter() functions were known to be lists.
The reason it wasn't considered was that it wasn't on the radar screen as even a possible use case.
Could you at least admit that this was an oversight and not try to pretend it was intentional breakage?
On a tangential note, I think in 2.2 or 2.3, we found a number of bugs related to None testing. IIRC, the outcome of that conversation was a specific recommendation to NOT determine Noneness by Boolean tests. That recommendation ended-up making it into PEP 290:
And I agree with that one in general (I was bitten by this in Zope once). But it bears a lot more weight when the type of the object is unknown or partially unknown. In my case, there was no possibility that the iter() argument was anything except a list, so the type of the iterator was fully known.
[Fred]
think iterators shouldn't have length at all: they're *not* containers and shouldn't act that way.
Some iterators can usefully report their length with the invariant: len(it) == len(list(it)).
I still consider this an erroneous hypergeneralization of the concept of iterators. Iterators should be pure iterators and not also act as containers. Which other object type implements __len__ but not __getitem__?
There are some use cases for having the length when available. Also, there has been plenty of interest in being able to tell, when possible, if an iterator is empty without having to call it. AFAICT, the only downside was Guido's bool(it) situation.
Theer is plenty of interest in broken features all the time. IMO giving *some* iterators discoverable length (and other properties like reversability) but not all of them makes the iterator protocol more error-prone -- we're back to the situation where someone codes an algorithm for use with arbitrary iterators but only tests it with list and tuple iterators, and ends up breaking in the field. I know you can work around it, but that requires introspection which is not a great match for this kind of application.
FWIW, the origin of the idea came from reading a comp-sci paper about ways to overcome the limitations of linking operations together using only iterators (the paper's terminology talked about map/fold operations). The issue was that decoupling benefits were partially offset by the loss of useful information about the input to an operation (i.e. the supplier may know and the consumer may want to know the input size, the input type, whether the elements are unique, whether the data is sorted, its provenance, etc.)
Not every idea written up in a comp-sci paper is worth implementing (as acquisition in Zope 2 has amply proved). -- --Guido van Rossum (home page: http://www.python.org/~guido/)