Early halt for iterating a_list and iter(a_list)

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Aug 23 03:36:51 CEST 2008


On Fri, 22 Aug 2008 07:23:18 -0700, Lie wrote:

[...]
>> iterators are once-only objects.  there's nothing left in "c" when you
>> enter the inner loop the second time, so nothing is printed.
>>
>>
> Ah, now I see. You have to "restart" the iterator if you want to use it
> the second time (is it possible to do that?).

In general, no, iterators can't be restarted. Think of it as squeezing 
toothpaste out of a tube. You can't generally reverse the process.


[...]
> I see, but if/when a function expects a sequence but then fed with an
> iterator, it would be against duck-typing to check whether something is
> a sequence or an iterator, but iterator is good for one iteration only
> while sequence is good for multiple usage. So is there a clean way to
> handle this? (i.e. a design pattern that allows sequence and iterator to
> be treated with the same code)

The only clean way to treat iterators and sequences identically is to 
limit yourself to behaviour that both use. That pretty much means a 
simple for loop:

for item in iterator_or_sequence:
    do_something(item)

Fortunately that's an incredibly useful pattern.

I often find myself converting sequences to iterators, so I can handle 
both types identically:

def treat_first_item_specially(iterator_or_sequence):
    it = iter(iterator_or_sequence)
    try:
        first_item(it.next)
    except StopIteration:
        pass
    else:
        for item in it:
            other_items(item)




> If there is no such design pattern for that problem, should one be
> available? I'm thinking of one: "all iterables would have
> iterable.restart() method, which is defined as 'restarting the iterator
> for iterator' or 'do nothing for sequences'."

But not all iterators can be restarted. Here's a contrived example:

def once_only_iterator(directory):
    """Return the name of files being deleted."""
    for filename in os.list(directory):
        yield filename
        os.remove(filename)


You can't restart that one, at least not with a *lot* of effort.

In general, the only ways to restart an arbitrary iterator are:

(1) make a copy of everything the iterator returns, then iterate over the 
copy; or

(2) exploit idiosyncratic knowledge about the specific iterator in 
question. 

That in turn may mean: find the non-iterator data that your iterator 
uses, and use it again.

e.g.

data = {'a': 1, 'b': 2, 'c': 4, 'd': 8}
def make_iterator(data):
    items = sorted(data.items())
    for item in items:
        yield item

it = make_iterator(data)
for i in it:
    print i

# Restart the iterator.
it = make_iterator(data)

That's not exactly what you were hoping for, but in the generic case of 
arbitrary iterators, that's the best you're going to get.

Another example of exploiting specific knowledge about the iterator is 
that, starting from Python 2.5, generators become co-routines that can 
accept information as well as yield it. I suggest you read this:

http://docs.python.org/whatsnew/pep-342.html

but note carefully that you can't just call send() on any arbitrary 
iterator and expect it to do something sensible.

Lastly, you can write your own iterator, and give it it's own restart() 
method. I recommend the exercise. Once you see how much specific 
knowledge of the iterator is required, you may understand why there can't 
possibly be a generic restart() method that works on arbitrary iterators.


[...]
> Wait a minute... I've got an idea, we could use itertools.tee to copy
> the iterator and iterating on the copy, like this right?:
> 
> for a_ in a:
>     b, b_copy = itertools.tee(b)
>     for b_ in b_copy:
>         c, c_copy = itertools.tee(c)
>         for c_ in c_copy:
>             print a_, b_, c_
> 
> That works with both "requirement": able to handle sequence and iterator
> with the same code and the code for common cases where iterators are
> used once only wouldn't need to be changed. Personally though, I don't
> think it's a clean solution, looks a bit of hackery.

itertools.tee() works by keeping a copy of the iterator's return values. 
If your iterator is so huge you can't make a copy of its data, then tee() 
will fail.



-- 
Steven



More information about the Python-list mailing list