# [Python-Dev] "groupby" iterator

Guido van Rossum guido at python.org
Thu Nov 27 12:30:33 EST 2003

```In the shower (really!) I was thinking about the old problem of going
through a list of items that are supposed to be grouped by some key,
and doing something extra at the end of each group.  I usually end up
doing something ugly like this:

oldkey = None
for item in sequence:
newkey = item.key # this could be any function of item
if newkey != oldkey and oldkey is not None:
...do group processing...
oldkey = newkey
...do item processing...
...do group processing... # for final group

This is ugly because the group processing code has to be written twice
(or turned into a mini-subroutine); it also doesn't handle empty
sequences correctly.  Solutions based on using an explicit index and
peeking ahead are similarly cumbersome and hard to get right for all
end cases.

So I realized this is easy to do with a generator, assuming we can
handle keeping a list of all items in a group.  Here's the generator:

def groupby(key, iterable):
it = iter(iterable)
value = it.next() # If there are no items, this takes an early exit
oldkey = key(value)
group = [value]
for value in it:
newkey = key(value)
if newkey != oldkey:
yield group
group = []
oldkey = newkey
group.append(value)
yield group

Here's the usage ("item.key" is just an example):

for group in groupby(lambda item: item.key, sequence):
for item in group:
...item processing...
...group processing...

The only caveat is that if a group is very large, this accumulates all
its items in a large list.  I expect the generator can be reworked to
return an iterator instead, but getting the details worked out seems
too much work for a summy Thanskgiving morning. :-)

Example:

# Print lines of /etc/passwd, sorted, grouped by first letter
lines.sort()
for group in groupby(lambda s: s[0], lines):
print "-"*10
for line in group: print line,
print "-"*10

Maybe Raymond can add this to the itertools module?

Or is there a more elegant approach than my original code that I've
missed all these years?