[Python-Dev] Iteration - my summary

Oren Tirosh oren-py-d@hishome.net
Fri, 26 Jul 2002 11:15:07 +0300

There has been some lively discussion about the iteration protocols
lately. My impression of the opinions on the list so far is this:

It could have been semantically cleaner. There is a blurred boundary
between the iterable-container and iterator protocols. Perhaps next should
have been called __next__. Perhaps iterators should not have been required
to implement an __iter__ method returning self. With the benefit of
hindsight the protocols could have been designed better.

But there is nothing fundamentally broken about iteration. Nothing that
justifies any serious change that would break backward compatibility and 
require a transition plan.

A remaining sore spot is re-iterability. Iterators being their own
iterators is ok by itself. StopIteration being a sink state is ok by
itself. When they are combined they result in hard-to-trace silent errors
because an exhausted iterator is indistinguishable from an empty
container. This happens in real code, not in some contrived examples. It
is clear to me that this issue needs to be addressed in some way, but
without a complete redesign of the iteration protocols. My proposal of
raising an exception on calling .next() after StopIteration has been
rejected by Guido. Here's another approach:

Proposal: new built-in function reiter()

def reiter(obj):
    """reiter(obj) -> iterator

Get an iterator from an object. If the object is already an iterator a
TypeError exception will be raised. For all Python built-in types it is
guaranteed that if this function succeeds the next call to reiter() will
return a new iterator that produces the same items unless the object is
modified. Non-builtin iterable objects which are not iterators SHOULD
support multiple iteration returning the same items."""

    it = iter(obj)
    if it is obj:
        raise TypeError('Object is not re-iterable')
    return it


def cartprod(a,b):
   """ Generate the cartesian product of two sources. """
   for x in a:
       for y in reiter(b):
           yield x,y

This function should raise an exception if object b is a generator or some
other non re-iterable object. List comprehensions should use the C API
equivalent of reiter for sources other than the first.

This solution is less than perfect. It requires explicit attention by the 
programmer and is less comprehensive than the other solutions proposed but I 
think it's better than nothing.

A related issue is iteration of files. It's an exception for the guarantee 
made in the docstring above. My impression is that people generally agree 
that file objects are more iterator-like than container-like because they 
are stateful cursors. However, making files into iterators is not as simple 
as adding a next method that calls readline and raises StopIteration on EOF.
This implementation would lose the performance benefit from the readahead 
bufering done in the xreadlines object.

The way I see file object iteration is that the file object and xreadlines 
object abuse the iterable-container<->iterator relationship to produce a 
cursor-without-readahead-buffer<->cursor-with-readahead-buffer relationship.
I don't like objects pretending to be something they're not.

I can finish my xreadlines caching patch that makes a file into an iterator 
with an embedded xreadlines object. Perhaps it's not the most elegant 
solution but I don't see any real problems with it.  

I am also thinking about implementing line buffering inside the file object 
that can finally get rid of the whole fgets/getc_unlocked multiplatform mess
and make xreadlines unnecessary. The problem here is that readahead is not 
exactly a transparent operation. More on this later.