[Python-Dev] "groupby" iterator

Guido van Rossum guido at python.org
Thu Nov 27 16:49:40 EST 2003


> Here's a reworking which returns iterators.  I had to decide what to do if 
> the user tries to access things out of order; I raise an exception.  
> Anything else would complicate the code quite a lot I think.
> 
> def groupby(key, iterable):
>     it = iter(iterable)
>     value = it.next() # If there are no items, this takes an early exit
>     oldkey = [key(value)]
>     cache = [value]
>     lock = []
>     def grouper():
>         yield cache.pop()
>         for value in it:
>             newkey = key(value)
>             if newkey == oldkey[0]:
>                 yield value
>             else:
>                 oldkey[0] = newkey
>                 cache.append(value)
>                 break
>         del lock[0]
>     while 1:
>         if lock:
>             raise LookupError, "groups accessed out of order"
>         if not cache:
>             break
>         lock.append(1)
>         yield grouper()

Thanks!  Here's a class version of the same, which strikes me as
slightly easier to understand (though probably slower due to all the
instance variable access).  It may serve as an easier model for a C
implementation.  I decided not to deal explicitly with out-of-order
access; if the caller doesn't play by the rules, some of their groups
will be split and jumbled, but each split group will still have
matching keys.

class GroupBy(object):
    def __init__(self, key, iterable):
        self.key = key
        self.it = iter(iterable)
        self.todo = []
    def __iter__(self):
        return self
    def next(self):
        if self.todo:
            value, oldkey = self.todo.pop()
        else:
            value = self.it.next() # Exit if this raises StopIteration
            oldkey = self.key(value)
        return self._grouper(value, oldkey)
    def _grouper(self, value, oldkey):
        yield value
        for value in self.it:
            newkey = self.key(value)
            if newkey != oldkey:
                self.todo.append((value, newkey))
                break
            yield value

This is an example of what's so cool about iterators and generators:
You can code a particular idiom or mini-pattern (in this case grouping
list items) once and apply it to lots of situations.  That's of course
what all subroutines do, but iterators and generators open up lots of
places where previously it wasn't convenient to use a subroutine
(you'd have to use lots of lambdas -- or you'd have to have a language
supporting anonymous code blocks, which provide a lot of the same
power in a different way).

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list