itertools.groupby usage to get structured data
Peter Otten
__peter__ at web.de
Sat Feb 5 13:12:36 CET 2011
Slafs wrote:
> Hi there!
>
> I'm having trouble to wrap my brain around this kind of problem:
>
> What I have :
> 1) list of dicts
> 2) list of keys that i would like to be my grouping arguments of
> elements from 1)
> 3) list of keys that i would like do "aggregation" on the elements
> of 1) with some function e.g. sum
>
> For instance i got:
> 1) [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
> { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
> {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}, ... ]
> 2) ['g1', 'g2']
> 3) ['s_v1', 's_v2']
>
> To be precise 1) is a result of a values_list method from a QuerySet
> in Django; 2) is the arguments for that method; 3) those are the
> annotation keys. so 1) is a result of:
> qs.values_list('g1', 'g2').annotate(s_v1=Sum('v1'), s_v2=Sum('v2'))
>
> What i want to have is:
> a "big" nested dictionary with 'g1' values as 1st level keys and a
> dictionary of aggregates and "subgroups" in it.
>
> In my example it would be something like this:
> {
> 1 : {
> 's_v1' : 7.0,
> 's_v2' : 6.5,
> 'g2' :{
> 8 : {
> 's_v1' : 5.0,
> 's_v2' : 3.5 },
> 9 : {
> 's_v1' : 2.0,
> 's_v2' : 3.0 }
> }
> },
> 2 : {
> 's_v1' : 6.0,
> 's_v2' : 8.0,
> 'g2' : {
> 8 : {
> 's_v1' : 6.0,
> 's_v2' : 8.0}
> }
> },
> ...
> }
>
> # notice the summed values of s_v1 and s_v2 when g1 == 1
>
> I was looking for a solution that would let me do that kind of
> grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> grouping element so the 'g2' dicts could also have their own
> "subgroup" and be even more nested then.
> I was trying something with itertools.groupby and updating nested
> dicts, but as i was writing the code it started to feel too verbose to
> me :/
>
> Do You have any hints maybe? because i'm kind of stucked :/
>
> Regards
>
> Sławek
Not super-efficient, but simple:
$ cat python sumover.py
cat: python: No such file or directory
data = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
{ 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
{'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}]
sum_over = ["s_v1", "s_v2"]
group_by = ["g1", "g2"]
wanted = {
1 : {
's_v1' : 7.0,
's_v2' : 6.5,
'g2' :{
8 : {
's_v1' : 5.0,
's_v2' : 3.5 },
9 : {
's_v1' : 2.0,
's_v2' : 3.0 }
}
},
2 : {
's_v1' : 6.0,
's_v2' : 8.0,
'g2' : {
8 : {
's_v1' : 6.0,
's_v2' : 8.0}
}
},
}
def calc(data, group_by, sum_over):
tree = {}
group_by = group_by + [None]
for item in data:
d = tree
for g in group_by:
for so in sum_over:
d[so] = d.get(so, 0.0) + item[so]
if g:
d = d.setdefault(g, {}).setdefault(item[g], {})
return tree
got = calc(data, group_by, sum_over)[group_by[0]]
assert got == wanted
$ python sumover.py
$
Untested.
More information about the Python-list
mailing list