[Python-Dev] Single- vs. Multi-pass iterability

Guido van Rossum guido@python.org
Fri, 19 Jul 2002 16:29:29 -0400


> It's just not the way i expect for-loops to work.  Perhaps we would
> need to survey people for objective data, but i feel that most people
> would be surprised if
> 
>     for x in y: print x
>     for x in y: print x
> 
> did not print the same thing twice, or if
> 
>     if x in y: print 'got it'
>     if x in y: print 'got it'
> 
> did not do the same thing twice.  I realize this is my own opinion,
> but it's a fairly strong impression i have.

I think it's a naive persuasion that doesn't hold under scrutiny.

For a long time people have faked iterators by providing
pseudo-sequences that did unspeakable things.

In general, I'm pretty sure that if I asked an uninitiated user what
"for line in file" would do, if it did anything, they would understand
that if you tried that a second time you'd hit EOF right away.

> Even if it's okay for for-loops to destroy their arguments, i still
> think it sets up a bad situation: we may end up with functions
> manipulating sequence-like things all over, but it becomes unclear
> whether they destroy their arguments or not.  It becomes possible
> to write a function which sometimes destroys its argument and sometimes
> doesn't.  Bugs get deeper and harder to find.

This sounds awfully similar to the old argument "functions (as opposed
to procedures) should never have side effects".  ABC implemented that
literally (the environment was saved and restored around function
calls, with an exception for the seed for the built-in random
generator), with the hope that it would provide fewer surprises.  It
did the opposite: it drove people crazy because the language was
trying to be smarter than them.

> I believe this is where the biggest debate lies: whether "for" should be
> non-destructive.  I realize we are currently on the other side of the
> fence, but i foresee enough potential pain that i would like you to
> consider the value of keeping "for" loops non-destructive.

I don't see any real debate.  I only see you chasing windmills.
Sorry.  For-loops have had the possibility to destroy their arguments
since the day __getitem__ was introduced.

> > Maybe the for-loop is a red herring?  Calling next() on an
> > iterator may or may not be destructive on the underlying "sequence" --
> > if it is a generator, for example, I would call it destructive.
> 
> Well, for a generator, there is no underlying sequence.
> 
>     while 1: print next(gen)
> 
> makes it clear that there is no sequence, but
> 
>     for x in gen: print x
> 
> seems to give me the impression that there is.

This seems to be a misrepresentation.  The idiom for using any
iterator (not just generators) *without* using a for-loop would have
to be something like:

    while 1:
        try:
            item = it.next() # or it.__next__() or next(it)
        except StopIteration:
            break
        ...do something with item...

(Similar to the traditional idiom for looping over the lines of a
file.)  The for-loop over an iterator was invented so you could write
this as:

    for item in it:
        ...do something with item...

I'm not giving that up so easily!

> > Perhaps you're trying to assign properties to the iterator abstraction
> > that aren't really there?
> 
> I'm assigning properties to "for" that you aren't.  I think they
> are useful properties, though, and worth considering.

I'm trying to be open-minded, but I just don't see it.  The for loop
is more flexible than you seem to want it to be.  Alas, it's been like
this for years, and I don't think the for-loop needs a face lift.

> I don't think i'm assigning properties to the iterator abstraction;
> i expect iterators to destroy themselves.  But the introduction of
> iterators, in the way they are now, breaks this property of "for"
> loops that i think used to hold almost all the time in Python, and
> that i think holds all the time in almost all other languages.

Again, the widespread faking of iterators using destructive
__getitem__ methods that were designed to be only used in a for-loop
defeats your assertion.

> > Next, I'm not sure how renaming next() to __next__() would affect the
> > situation w.r.t. the destructivity of for-loops.  Or were you talking
> > about some other migration?
> 
> The connection is indirect.  The renaming is related to: (a) making
> __next__() a real, honest-to-goodness protocol independent of __iter__;

next() is a real, honest-to-goodness protocol now, and it is
independent of __iter__() now.

> and (b) getting rid of __iter__ on iterators.  It's the presence of
> __iter__ on iterators that breaks the non-destructive-for property.

So you prefer the while-loop version above over the for-loop version?
Gotta be kidding.

> I think the renaming of next() to __next__() is a good idea in any
> case.  It is distant enough from the other issues that it can be done
> independently of any decisions about __iter__.

Yeah, it's just a pain that it's been deployed in Python 2.2 since
last December, and by the time 2.3 is out it will probably have been
at least a full year.  Worse, 2.2 is voted to be Python-in-a-Tie,
giving that particular idiom a very long lifetime.  I simply don't
think we can break compatibility that easily.  Remember the endless
threads we've had about the pace of change and stability.  We have to
live with warts, alas.  And this is a pretty minor one if you ask me.

(I realize that you're proposing another way out in a separate
message.  I'll reply to that next.  Since you changed the subject, I
can't wery well reply to it here.)

--Guido van Rossum (home page: http://www.python.org/~guido/)