[Python-Dev] "groupby" iterator
Guido van Rossum
guido at python.org
Thu Nov 27 12:30:33 EST 2003
In the shower (really!) I was thinking about the old problem of going
through a list of items that are supposed to be grouped by some key,
and doing something extra at the end of each group. I usually end up
doing something ugly like this:
oldkey = None
for item in sequence:
newkey = item.key # this could be any function of item
if newkey != oldkey and oldkey is not None:
...do group processing...
oldkey = newkey
...do item processing...
...do group processing... # for final group
This is ugly because the group processing code has to be written twice
(or turned into a mini-subroutine); it also doesn't handle empty
sequences correctly. Solutions based on using an explicit index and
peeking ahead are similarly cumbersome and hard to get right for all
end cases.
So I realized this is easy to do with a generator, assuming we can
handle keeping a list of all items in a group. Here's the generator:
def groupby(key, iterable):
it = iter(iterable)
value = it.next() # If there are no items, this takes an early exit
oldkey = key(value)
group = [value]
for value in it:
newkey = key(value)
if newkey != oldkey:
yield group
group = []
oldkey = newkey
group.append(value)
yield group
Here's the usage ("item.key" is just an example):
for group in groupby(lambda item: item.key, sequence):
for item in group:
...item processing...
...group processing...
The only caveat is that if a group is very large, this accumulates all
its items in a large list. I expect the generator can be reworked to
return an iterator instead, but getting the details worked out seems
too much work for a summy Thanskgiving morning. :-)
Example:
# Print lines of /etc/passwd, sorted, grouped by first letter
lines = open("/etc/passwd").readlines()
lines.sort()
for group in groupby(lambda s: s[0], lines):
print "-"*10
for line in group: print line,
print "-"*10
Maybe Raymond can add this to the itertools module?
Or is there a more elegant approach than my original code that I've
missed all these years?
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list