[Python-Dev] Termination of two-arg iter()

Sun, 14 Jul 2002 18:19:36 -0400

> I'll note one pragmatic concern.  This idiom is becoming mildly popular:
> 
> for x in someiterator:
>     if is_boundary_marker(x):
>         break
>     else:
>         do_something_with(x)
> 
> followed by (in time, not necessarily in a physically distinct loop):
> 
> for x in someiterator:
>     # and we expect this to pick up where the last loop left off
> 
> If StopIteration isn't a sink state, this falls under the "code
> manipulating arbitrary iterators must therefore not rely on any
> particular behavior in this case" warning in the reworded docs.
> That is, if the first loop terminated via iterator exhaustion, the
> obvious intent is that the second loop never enter its body.  This
> is reliably true if and only if StopIteration is guaranteed to be a
> sink state.  The more I ponder that, the more I'm inclined to
> believe that the PEP made the right decision the first time:
> guaranteeing *something* makes it possible to write a larger class
> of generic code.

But if you fall through the end of the first loop, i.e. you exhaust
the iterator prematurely, you should do something else in your logic.
An else clause on the for loop might be a good place to do something
appropriate.

I haven't used this idiom often enough to know whether that places an
undue burden on the programmer.  I think the reported cases fall
mostly in the category "I didn't know it could do that and it took me
a long time to track it down."

I also note that even if the PEP specifies that StopIteration is a
sink state and we fix all built-in iterators to make it so, it's easy
for an iterator implementation to do the wrong thing (especially since
often an extra state bit is necessary to implement the sinkstate
property).

The question is, should we place the burden on iterator users to avoid
calling next() after the first StopIteration, or should we place the
burden on iterator implementations?  Since by far the most common
iterator use case is still a single for loop, which already does the
right thing, it's not at all clear to me which is worse.

--Guido van Rossum (home page: http://www.python.org/~guido/)