itertools.groupby usage to get structured data
nn
pruebauno at latinmail.com
Mon Feb 7 11:52:44 EST 2011
On Feb 5, 7:12 am, Peter Otten <__pete... at web.de> wrote:
> Slafs wrote:
> > Hi there!
>
> > I'm having trouble to wrap my brain around this kind of problem:
>
> > What I have :
> > 1) list of dicts
> > 2) list of keys that i would like to be my grouping arguments of
> > elements from 1)
> > 3) list of keys that i would like do "aggregation" on the elements
> > of 1) with some function e.g. sum
>
> > For instance i got:
> > 1) [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
> > { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
> > {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}, ... ]
> > 2) ['g1', 'g2']
> > 3) ['s_v1', 's_v2']
>
> > To be precise 1) is a result of a values_list method from a QuerySet
> > in Django; 2) is the arguments for that method; 3) those are the
> > annotation keys. so 1) is a result of:
> > qs.values_list('g1', 'g2').annotate(s_v1=Sum('v1'), s_v2=Sum('v2'))
>
> > What i want to have is:
> > a "big" nested dictionary with 'g1' values as 1st level keys and a
> > dictionary of aggregates and "subgroups" in it.
>
> > In my example it would be something like this:
> > {
> > 1 : {
> > 's_v1' : 7.0,
> > 's_v2' : 6.5,
> > 'g2' :{
> > 8 : {
> > 's_v1' : 5.0,
> > 's_v2' : 3.5 },
> > 9 : {
> > 's_v1' : 2.0,
> > 's_v2' : 3.0 }
> > }
> > },
> > 2 : {
> > 's_v1' : 6.0,
> > 's_v2' : 8.0,
> > 'g2' : {
> > 8 : {
> > 's_v1' : 6.0,
> > 's_v2' : 8.0}
> > }
> > },
> > ...
> > }
>
> > # notice the summed values of s_v1 and s_v2 when g1 == 1
>
> > I was looking for a solution that would let me do that kind of
> > grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> > grouping element so the 'g2' dicts could also have their own
> > "subgroup" and be even more nested then.
> > I was trying something with itertools.groupby and updating nested
> > dicts, but as i was writing the code it started to feel too verbose to
> > me :/
>
> > Do You have any hints maybe? because i'm kind of stucked :/
>
> > Regards
>
> > SÅ‚awek
>
> Not super-efficient, but simple:
>
> $ cat python sumover.py
> cat: python: No such file or directory
> data = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
> { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
> {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}]
> sum_over = ["s_v1", "s_v2"]
> group_by = ["g1", "g2"]
>
> wanted = {
> 1 : {
> 's_v1' : 7.0,
> 's_v2' : 6.5,
> 'g2' :{
> 8 : {
> 's_v1' : 5.0,
> 's_v2' : 3.5 },
> 9 : {
> 's_v1' : 2.0,
> 's_v2' : 3.0 }
> }
> },
> 2 : {
> 's_v1' : 6.0,
> 's_v2' : 8.0,
> 'g2' : {
> 8 : {
> 's_v1' : 6.0,
> 's_v2' : 8.0}
> }
> },
>
> }
>
> def calc(data, group_by, sum_over):
> tree = {}
> group_by = group_by + [None]
> for item in data:
> d = tree
> for g in group_by:
> for so in sum_over:
> d[so] = d.get(so, 0.0) + item[so]
> if g:
> d = d.setdefault(g, {}).setdefault(item[g], {})
> return tree
>
> got = calc(data, group_by, sum_over)[group_by[0]]
> assert got == wanted
> $ python sumover.py
> $
>
> Untested.
Very clever. I didn't understand how it worked until I rewrote it like
this:
def calc(data, group_by, sum_over):
tree = {}
group_by = [None] + group_by
for item in data:
d = tree
for g in group_by:
if g:
d = d.setdefault(g, {}).setdefault(item[g], {})
for so in sum_over:
d[so] = d.get(so, 0.0) + item[so]
return tree
Processing "None" in the last round of the loop was throwing me off.
More information about the Python-list
mailing list