[Python-Dev] Single- vs. Multi-pass iterability

Ka-Ping Yee ping@zesty.ca
Tue, 16 Jul 2002 18:18:05 -0700 (PDT)


On Mon, 15 Jul 2002, Andrew Koenig wrote:
> However the purpose my suggestion of __multiter__ was not to use it to
> test for multiple iteration, but to enable a container to be able to
> yield either a single or a multiple iterator on request.

I see what you want, though i have a hard time imagining a situation
where it's really necessary to have both (as opposed to just the
multiple iterator, which is strictly more capable).  I can certainly
see how you might want to be able to ask for a breadth-first or
depth-first iterator on a tree, though.

> > Or, what if there is no container to begin with, but the iterator is still
> > copyable?  You can't flag that by putting __multiter__ on anything; again
> > it makes more sense to just provide __copy__ on the iterator.
>
> You could flag it by putting __multiter__ on the iterator, just as iterators
> presently have __iter__.

Ugh.  I don't like this, for the reasons i outlined in another message:
an iterator is not the same as a container.  Iterators always mutate;
containers usually do not (at least not as a result of looking at the
elements).

> > All that's really necessary here is to document the convention about what
> > __copy__ is supposed to mean if it's available on an iterator.  If we
> > all agree that __copy__ should preserve an independent copy of the
> > current state of the iterator, we're all set.
>
> Not quite.  We also need an agreement that calling __iter__ on a container
> is not a destructive operation unless you call next() on the iterator that
> you get back.

What i'd like is an agreement that calling __iter__ on a container is
not a destructive operation at all.  If it's destructive, then what you
started with is not really a container, and we should encourage people
to call attention to this irregularity in their documentation.

> > I think a proliferation of iterator-fetching methods would be a
> > messy and unpleasant prospect.  After __iter__, __multiter__,
> > and __ambiter__, what next?  __mutableiter__?
> > __depthfirstiter__?  __breadthfirstiter__?
>
> A data structure that supports several different kinds of iteration
> has to provide that support somehow.

Agreed.  I was unclear: what makes me uncomfortable is the pollution
of the double-underscore namespace.  When you do have a container-like
object that supports various kinds of iteration, naturally you are
going to need some methods for getting iterators.  I just think it's
not appropriate to establish special names for them.

To me, the presence of double-underscores around a method name means
that the method is called automatically.  My expectation is that when
i write a method with a "normal" name, the name itself will appear
after a dot wherever that method is used; and that when there's a
method with a "__special__" name, the method is called implicitly.
The implicit call can occur via an operator (e.g. __add__), or to
implement a protocol defined in the language (e.g. __init__), etc.
If you see the string ".__" it means that something unusual is going on.

If you follow this convention, then "__iter__" deserves a special name,
because it is the specially blessed iterator-getter used by "for".
There may be other iterator-getters, but they must be called explicitly,
so they shouldn't get underscores.

                *               *               *

An aside on "next" vs. "__next__":

Note that this convention would also suggest that "next" should be
called "__next__", since "for" calls "next" implicitly.  I forget
why we ended up going with "next" instead of "__next__".  I think
"__next__" would have been better, especially in light of this:

Tim Peters wrote:
> Requiring *some* method with a reserved name is an aid to
> introspection, lest it become impossible to distinguish, say,
> an iterator from an instance of a doubly-linked list node class
> that just happens to supply methods named .prev() and .next()
> for an unrelated purpose.

This is exactly why the iterator protocol should consist of one
method named "__next__" rather than two methods named "__iter__"
(which has nothing to do with the act of iterating!) and "next"
(which is the one we really care about, but can collide with
existing method names).

As far as i know, "next" is the only implicitly-called method of
an internal protocol that has no underscores.  It's a little late
to fix the name of "next" in Python 2, though it might be worth
considering for Python 3.


-- ?!ng