itertools.groupby
Paul Rubin
http
Mon May 28 02:34:55 EDT 2007
Raymond Hettinger <python at rcn.com> writes:
> On May 27, 8:28 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> > I use the module all the time now and it is great.
> Thanks for the accolades and the great example.
Thank YOU for the great module ;). Feel free to use the example in the
docs if you want. The question someone coincidentally posted about
finding sequences of capitalized words also made a nice example.
Here's yet another example that came up in something I was working on:
you are indexing a book and you want to print a list of page numbers
for pages that refer to George Washington. If Washington occurs on
several consecutive pages you want to print those numbers as a
hyphenated range, e.g.
Washington, George: 5, 19, 37-45, 82-91, 103
This is easy with groupby (this version not tested but it's pretty close
to what I wrote in the real program). Again it works by Bates numbering,
but a little more subtly (enumerate generates the Bates numbers):
snd = operator.itemgetter(1) # as before
def page_ranges():
pages = sorted(filter(contains_washington, all_page_numbers))
for d,g in groupby(enumerate(pages), lambda (i,p): i-p):
h = map(snd, g)
if len(h) > 1:
yield '%d-%d'% (h[0], h[-1])
else:
yield '%d'% h[0]
print ', '.join(page_ranges())
See what has happened: for a sequence of p's that are consecutive, i-p
stays constant, and groupby splits out the clusters where this occurs.
> FWIW, I checked in a minor update to the docs: ...
The uniq example certainly should be helpful for Unix users.
More information about the Python-list
mailing list