itertools.groupby usage to get structured data
Paul Rubin
no.email at nospam.invalid
Fri Feb 4 23:58:53 EST 2011
Slafs <slafs.e at gmail.com> writes:
> What i want to have is:
> a "big" nested dictionary with 'g1' values as 1st level keys and a
> dictionary of aggregates and "subgroups" in it....
>
> I was looking for a solution that would let me do that kind of
> grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> grouping element so the 'g2' dicts could also have their own
> "subgroup" and be even more nested then.
> I was trying something with itertools.groupby and updating nested
> dicts, but as i was writing the code it started to feel too verbose to
> me :/
>
> Do You have any hints maybe? because i'm kind of stucked :/
I'm not sure I understood the problem and it would help if you gave
sample data with the deeper nesting that you describe. But the
following messy code matches the sample that you did give:
from pprint import pprint
from itertools import groupby
x1 = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
{ 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
{ 'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}
]
x2 = ['g1', 'g2']
x3 = ['s_v1', 's_v2']
def agg(xdata, group_keys, agg_keys):
if not group_keys:
return {}
k0, ks = group_keys[0], group_keys[1:]
r = {}
def gk(d): return d[k0]
for k, g in groupby(sorted(xdata, key=gk), gk):
gs = list(g)
aggs = dict((ak,sum(d[ak] for d in gs)) for ak in agg_keys)
r[k] = aggs
if ks:
r[k][ks[0]] = agg(gs,group_keys[1:], agg_keys)
return r
pprint (agg(x1, x2, x3))
More information about the Python-list
mailing list