grokking generators and iterators

Alex Martelli aleax at aleax.it
Sat May 11 05:21:09 EDT 2002


Andrew Dalke wrote:

> Bob Horvath:
>>How would you suggest writing a generator that spits out 5 line chunks
>>of a file at a time?
>>
>>How would you do it without generators?
> 
> David Eppstein answered the first.
> 
> As to the second, here's an (untested) example
        ...
> You can see, it's rather more complicated than using yield.

It's a _bit_ more complicated because you have to construct an instance
and stash away the state, but need be novere as complicated as in your
example, I think.


> class FiveAtATime:
>   def __init__(self, file):
>     self.file = file
>   def next(self):
>     # To iterate manually, and allow iter
>     s = self.file.readline()
>     if not s:
>       return None
>     data = [s]
>     for i in range(4):
>       data.append(self.file.readline())
>     return data

Method next should raise StopIteration when done,
not just start returning an unending stream of None.

That's easiest to obtain by using .next() calls on
an iterator for the file (which also lets the class
wrap any iterator or iterable, not just a file) and
propagating the StopIteraton to the caller.  If one
must return the 'trailing incomplete quintuplet' (the
easier solutions snip it off) this can still be
arranged, of course.

One interesting way of arranging it might be through
another general-purpose adapter (untested):

def iterpad(iterorseq, N=5, filler=None):
    baseiter = iter(iterorseq)
    while 1:
        yield baseiter.next()
        for i in range(N-1):
            try: next = baseiter.next()
            except StopIteration: next = filler
            yield next

This wraps an iterator so as to guarantee the wrapped
version will yield a number of times that's a multiple
of N -- padding, if need be, with copies of 'filler'.

So, the buncher-by-5's would simply use iterpad(file)
rather than iter(file) to get the iterator it's bunching,
and no need to worry about what happens when bunching
sequences whose number of items is not a multiple of 5.

The solution might then be on the lines of (untested):

class N_at_a_time:
   def __init__(self, file, N=5, pad=True):
     if pad: self.iter = iterpad(file, by)
     else: self.iter = iter(file)
     self.R = range(N)
   def next(self):
     return [self.iter.next() for i in self.R]
   def __iter__(self): return self

as the equivalent of the generator:

def N_at_a_time(file, N=5, pad=True):
    if pad: biter = iterpad(file, by)
    else: biter = iter(file)
    R = range(N)
    while 1:
        yield [biter.next() for i in R]

So, yes, the generator IS simpler, but the difference
isn't really all that huge.



>   def __getitem__(self, i):
>     # To allow use in for loops

No need -- "being an iterator" in itself allows use in for
loops in Python 2.2.  __getitem__ was only needed for the
purpose in 2.1 and earlier, where iterators didn't exist.

>   def __iter__(self):
>     # For new-style iters
>     return iter(self, None)

The two-arguments for of iter requires a callable as its
first argument, and proceeds much like:

def iter2a(callable, sentinel):
    while 1:
        x = callable()
        if x==sentinel: break
        yield x

(note that "falling off the end", or a 'return', raises
StopIteration).        

So, this form of iter is not appropriate here.  Generally
a class whose instances are meant to BE iterators, as here,
would just 'return self' in method __iter__.


Alex




More information about the Python-list mailing list