
Here's a reworking which returns iterators. I had to decide what to do if the user tries to access things out of order; I raise an exception. Anything else would complicate the code quite a lot I think.
def groupby(key, iterable): it = iter(iterable) value = it.next() # If there are no items, this takes an early exit oldkey = [key(value)] cache = [value] lock = [] def grouper(): yield cache.pop() for value in it: newkey = key(value) if newkey == oldkey[0]: yield value else: oldkey[0] = newkey cache.append(value) break del lock[0] while 1: if lock: raise LookupError, "groups accessed out of order" if not cache: break lock.append(1) yield grouper()
Thanks! Here's a class version of the same, which strikes me as slightly easier to understand (though probably slower due to all the instance variable access). It may serve as an easier model for a C implementation. I decided not to deal explicitly with out-of-order access; if the caller doesn't play by the rules, some of their groups will be split and jumbled, but each split group will still have matching keys. class GroupBy(object): def __init__(self, key, iterable): self.key = key self.it = iter(iterable) self.todo = [] def __iter__(self): return self def next(self): if self.todo: value, oldkey = self.todo.pop() else: value = self.it.next() # Exit if this raises StopIteration oldkey = self.key(value) return self._grouper(value, oldkey) def _grouper(self, value, oldkey): yield value for value in self.it: newkey = self.key(value) if newkey != oldkey: self.todo.append((value, newkey)) break yield value This is an example of what's so cool about iterators and generators: You can code a particular idiom or mini-pattern (in this case grouping list items) once and apply it to lots of situations. That's of course what all subroutines do, but iterators and generators open up lots of places where previously it wasn't convenient to use a subroutine (you'd have to use lots of lambdas -- or you'd have to have a language supporting anonymous code blocks, which provide a lot of the same power in a different way). --Guido van Rossum (home page: http://www.python.org/~guido/)