In the shower (really!) I was thinking about the old problem of going through a list of items that are supposed to be grouped by some key, and doing something extra at the end of each group. I usually end up doing something ugly like this: oldkey = None for item in sequence: newkey = item.key # this could be any function of item if newkey != oldkey and oldkey is not None: ...do group processing... oldkey = newkey ...do item processing... ...do group processing... # for final group This is ugly because the group processing code has to be written twice (or turned into a mini-subroutine); it also doesn't handle empty sequences correctly. Solutions based on using an explicit index and peeking ahead are similarly cumbersome and hard to get right for all end cases. So I realized this is easy to do with a generator, assuming we can handle keeping a list of all items in a group. Here's the generator: def groupby(key, iterable): it = iter(iterable) value = it.next() # If there are no items, this takes an early exit oldkey = key(value) group = [value] for value in it: newkey = key(value) if newkey != oldkey: yield group group = [] oldkey = newkey group.append(value) yield group Here's the usage ("item.key" is just an example): for group in groupby(lambda item: item.key, sequence): for item in group: ...item processing... ...group processing... The only caveat is that if a group is very large, this accumulates all its items in a large list. I expect the generator can be reworked to return an iterator instead, but getting the details worked out seems too much work for a summy Thanskgiving morning. :-) Example: # Print lines of /etc/passwd, sorted, grouped by first letter lines = open("/etc/passwd").readlines() lines.sort() for group in groupby(lambda s: s[0], lines): print "-"*10 for line in group: print line, print "-"*10 Maybe Raymond can add this to the itertools module? Or is there a more elegant approach than my original code that I've missed all these years? --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (4)
-
Guido van Rossum
-
Neil Schemenauer
-
Raymond Hettinger
-
Tim Peters