itertools.groupby usage to get structured data

Paul Rubin no.email at nospam.invalid
Sat Feb 5 05:58:53 CET 2011


Slafs <slafs.e at gmail.com> writes:
> What i want to have is:
> a "big" nested dictionary with 'g1' values as 1st level keys and a
> dictionary of aggregates and "subgroups" in it....
>
> I was looking for a solution that would let me do that kind of
> grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> grouping element so the 'g2' dicts could also have their own
> "subgroup" and be even more nested then.
> I was trying something with itertools.groupby and updating nested
> dicts, but as i was writing the code it started to feel too verbose to
> me :/
>
> Do You have any hints maybe? because i'm kind of stucked :/

I'm not sure I understood the problem and it would help if you gave
sample data with the deeper nesting that you describe.  But the
following messy code matches the sample that you did give:

    from pprint import pprint
    from itertools import groupby

    x1 = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
              { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
              { 'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}
              ]
    x2 = ['g1', 'g2']
    x3 = ['s_v1', 's_v2']

    def agg(xdata, group_keys, agg_keys):
        if not group_keys:
            return {}
        k0, ks = group_keys[0], group_keys[1:]
        r = {}
        def gk(d): return d[k0]
        for k, g in groupby(sorted(xdata, key=gk), gk):
            gs = list(g)
            aggs = dict((ak,sum(d[ak] for d in gs)) for ak in agg_keys)
            r[k] = aggs
            if ks:
                r[k][ks[0]] = agg(gs,group_keys[1:], agg_keys)
        return r

    pprint (agg(x1, x2, x3))



More information about the Python-list mailing list