[Python-Dev] Single- vs. Multi-pass iterability

Andrew Koenig ark@research.att.com
Wed, 17 Jul 2002 11:27:31 -0400 (EDT)


Ping> On Mon, 15 Jul 2002, Andrew Koenig wrote:

>> However the purpose my suggestion of __multiter__ was not to use it to
>> test for multiple iteration, but to enable a container to be able to
>> yield either a single or a multiple iterator on request.

Ping> I see what you want, though i have a hard time imagining a situation
Ping> where it's really necessary to have both (as opposed to just the
Ping> multiple iterator, which is strictly more capable).  I can certainly
Ping> see how you might want to be able to ask for a breadth-first or
Ping> depth-first iterator on a tree, though.

How about a class that represents a file?  If you ask it for a single
iterator, that's easy.  If you ask it for a multiple iterator, it checks
whether the file is really an interactive device such as a pipe or a keyboard.
If so, it uses a buffering mechanism to simulate multiple iteration;
otherwise, it lets the multiple iterators access the file directly.

Then when you ask to iterate over the file, you automatically get the
least cumbersome mechanism needed to support the kind of iteration
that you requested.

>> > Or, what if there is no container to begin with, but the iterator is still
>> > copyable?  You can't flag that by putting __multiter__ on anything; again
>> > it makes more sense to just provide __copy__ on the iterator.

>> You could flag it by putting __multiter__ on the iterator, just as iterators
>> presently have __iter__.

Ping> Ugh.  I don't like this, for the reasons i outlined in another message:
Ping> an iterator is not the same as a container.  Iterators always mutate;
Ping> containers usually do not (at least not as a result of looking at the
Ping> elements).

The scenario is this:

    def search(thing):
	iter = thing.__multiter__()
	// now either iter is an iterator that supports __copy__
	// or we will raise an exception (and raise it here, rather
	// than waiting for the first time we try to copy iter). 

>> Not quite.  We also need an agreement that calling __iter__ on a container
>> is not a destructive operation unless you call next() on the iterator that
>> you get back.

Ping> What i'd like is an agreement that calling __iter__ on a container is
Ping> not a destructive operation at all.  If it's destructive, then what you
Ping> started with is not really a container, and we should encourage people
Ping> to call attention to this irregularity in their documentation.

Is a file a container or not?  Isn't making an iterator from a file and
calling next() on it a destructive operation?

>> > I think a proliferation of iterator-fetching methods would be a
>> > messy and unpleasant prospect.  After __iter__, __multiter__,
>> > and __ambiter__, what next?  __mutableiter__?
>> > __depthfirstiter__?  __breadthfirstiter__?

>> A data structure that supports several different kinds of iteration
>> has to provide that support somehow.

Ping> Agreed.  I was unclear: what makes me uncomfortable is the pollution
Ping> of the double-underscore namespace.  When you do have a container-like
Ping> object that supports various kinds of iteration, naturally you are
Ping> going to need some methods for getting iterators.  I just think it's
Ping> not appropriate to establish special names for them.

Fair enough.  But then why is __iter__ special?

Ping> To me, the presence of double-underscores around a method name means
Ping> that the method is called automatically.  My expectation is that when
Ping> i write a method with a "normal" name, the name itself will appear
Ping> after a dot wherever that method is used; and that when there's a
Ping> method with a "__special__" name, the method is called implicitly.
Ping> The implicit call can occur via an operator (e.g. __add__), or to
Ping> implement a protocol defined in the language (e.g. __init__), etc.
Ping> If you see the string ".__" it means that something unusual is going on.

Ping> If you follow this convention, then "__iter__" deserves a special name,
Ping> because it is the specially blessed iterator-getter used by "for".
Ping> There may be other iterator-getters, but they must be called explicitly,
Ping> so they shouldn't get underscores.

Ah, is it only "for" that makes __iter__ special, and not iter() ?

Ping> An aside on "next" vs. "__next__":

Ping> This is exactly why the iterator protocol should consist of one
Ping> method named "__next__" rather than two methods named "__iter__"
Ping> (which has nothing to do with the act of iterating!) and "next"
Ping> (which is the one we really care about, but can collide with
Ping> existing method names).

Ping> As far as i know, "next" is the only implicitly-called method of
Ping> an internal protocol that has no underscores.  It's a little late
Ping> to fix the name of "next" in Python 2, though it might be worth
Ping> considering for Python 3.

One way to clarify a discussion of a protocol is to append an "s" and
think of a plurality of protocols, so as to see which properites are
truly intrinsic and which can vary between protocols.  That's part of
what I'm trying to do in this discussion.

(and I don't presently have a strong opinion about what the right answer
is.  I don't even know for sure what the question is.)