[Python-Dev] bool(iter([])) changed between 2.3 and 2.4

Wed Sep 21 02:07:35 CEST 2005

On 9/20/05, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:
> [Guido]
> > I just finished debugging some code that broke after upgrading to
> > Python 2.4 (from 2.3). Turns out the code was testing list iterators
> > for their boolean value (to distinguish them from None). In 2.3, a
> > list iterator (like any iterator) is always true. In 2.4, an exhausted
> > list iterator is false; probably by virtue of having a __len__()
> > method that returns the number of remaining items.
> >
> > I realize that this was a deliberate feature, and that it exists in
> > 2.4 as well as in 2.4.1 and will in 2.4.2; yet, I'm not sure I *like*
> > it. Was this breakage (which is not theoretical!) considered at all?
> 
> It was not considered.

That's too bad.

> AFAICT, 2.3 code assuming the Boolean value of
> an iterator being true was relying on an accidental implementation
> detail that may not also be true in Jython, PyPy, etc.

That's bullshit, and you know it -- you're just using this to justify
that you didn't think of this.

Whether an object is true or not is well-defined by the language and
not by an accident of the implementation. Apart from None, all objects
are always true unless they define either __nonzero__() or (in its
absence) __len__().

The iterators for builtin sequences were carefully designed to have
the minimal API required of iterators -- i.e., next() and __iter__()
and nothing more.

> Likewise, it is
> not universally true for arbitrary class based iterators which may have
> other methods including __nonzero__ or __len__.

And those are the *only* ones that affect the boolean value.

> The Boolean value of an
> iterator is certainly not promised by the iterator protocol as specified
> in the docs or the PEP.

it was implied by not specifying a __nonzero__ or __len__.

> The code, bool(it), is not really clear about
> its intent and seems a little weird to me.

Of course that's not what the broken code actually looked like. If was
something like

if ...:
    iter1 = iter(...)
else:
    iter1 = None
if ...:
    iter2 = iter(...)
else:
    iter2 = None
...
if iter1 and iter2:
    ...

Where the arguments to the iter() functions were known to be lists.

> The reason it wasn't
> considered was that it wasn't on the radar screen as even a possible use
> case.

Could you at least admit that this was an oversight and not try to
pretend it was intentional breakage?

> On a tangential note, I think in 2.2 or 2.3, we found a number of bugs
> related to None testing.  IIRC, the outcome of that conversation was a
> specific recommendation to NOT determine Noneness by Boolean tests.
> That recommendation ended-up making it into PEP 290:
> 
>     http://www.python.org/peps/pep-0290.html#testing-for-none

And I agree with that one in general (I was bitten by this in Zope once).

But it bears a lot more weight when the type of the object is unknown
or partially unknown. In my case, there was no possibility that the
iter() argument was anything except a list, so the type of the
iterator was fully known.

> [Fred]
> > think iterators shouldn't have length at all:
> > they're *not* containers and shouldn't act that way.
> 
> Some iterators can usefully report their length with the invariant:
>    len(it) == len(list(it)).

I still consider this an erroneous hypergeneralization of the concept
of iterators. Iterators should be pure iterators and not also act as
containers. Which other object type implements __len__ but not
__getitem__?

> There are some use cases for having the length when available.  Also,
> there has been plenty of interest in being able to tell, when possible,
> if an iterator is empty without having to call it.  AFAICT, the only
> downside was Guido's bool(it) situation.

Theer is plenty of interest in broken features all the time. IMO
giving *some* iterators discoverable length (and other properties like
reversability) but not all of them makes the iterator protocol more
error-prone -- we're back to the situation where someone codes an
algorithm for use with arbitrary iterators but only tests it with list
and tuple iterators, and ends up breaking in the field.

I know you can work around it, but that requires introspection which
is not a great match for this kind of application.

> FWIW, the origin of the idea came from reading a comp-sci paper about
> ways to overcome the limitations of linking operations together using
> only iterators (the paper's terminology talked about map/fold
> operations).  The issue was that decoupling benefits were partially
> offset by the loss of useful information about the input to an operation
> (i.e. the supplier may know and the consumer may want to know the input
> size, the input type, whether the elements are unique, whether the data
> is sorted, its provenance, etc.)

Not every idea written up in a comp-sci paper is worth implementing
(as acquisition in Zope 2 has amply proved).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)