[Tutor] MapReduce
Kent Johnson
kent37 at tds.net
Tue Feb 6 00:58:44 CET 2007
Steve Nelson wrote:
> On 2/5/07, Steve Nelson <sanelson at gmail.com> wrote:
>> What I want to do is now "group" these urls so that repeated urls have
>> as their "partner" a lsit of indexes. To take a test example of the
>> method I have in mind:
>>
>> def testGrouper(self):
>> """Group occurences of a record together"""
>> test_list = [('fred', 1), ('jim', 2), ('bill', 3), ('jim', 4)]
>> grouped_list = [('fred', 1), ('jim', [2, 4]), ('bill' ,3)]
>> self.assertEqual(myGroup(test_list), grouped_list)
>
> <snip>
>
>> I would like a clearer, more attractive way of
>> making the test pass. If this can be done in functional style, even
>> better.
>
> I now have:
>
> def myGroup(stuff):
> return [(key, map(lambda item: item[1], list(group))) for key, group
> in groupby(sorted(stuff), lambda item: item[0] )]
>
> Not sure I fully understand how groupby objects work, nor what a
> sub-iterator is, though. But I more or less understand it.
Sub-iterator is just a way to refer to a nested iterator - groupby()
yields tuples one of whose members is an iterator. Since groupby() is
also an iterator (well, a generator actually), they call the nested
iterator a sub-iterator.
>
> I understand I could use itemgetter() instead of the lambda...
>
> Can anyone clarify?
I have written an explanation of itemgetter and groupby here:
http://personalpages.tds.net/~kent37/blog/arch_m1_2005_12.html#e69
You can also do this operation easily with dicts (not tested!):
def myGroup(stuff):
groups = {}
for url, index in stuff:
groups.setdefault(url, []).append(index)
return sorted(groups.items())
Or a bit less opaque in Python 2.5, avoiding setdefault():
from collections import defaultdict
def myGroup(stuff):
groups = defaultdict(list)
for url, index in stuff:
groups[url].append(index)
return sorted(groups.items())
Kent
More information about the Tutor
mailing list