itertools.groupby usage to get structured data
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Feb 4 21:27:43 EST 2011
On Fri, 04 Feb 2011 15:14:24 -0800, Slafs wrote:
> Hi there!
>
> I'm having trouble to wrap my brain around this kind of problem:
Perhaps you should consider backing up and staring from somewhere else
with different input data, or changing the requirements. Just a thought.
> What I have :
> 1) list of dicts
> 2) list of keys that i would like to be my grouping arguments of
> elements from 1)
> 3) list of keys that i would like do "aggregation" on the elements
> of 1) with some function e.g. sum
You start with data:
dicts = [ {'g1': 1, 'g2': 8, 's_v1': 5.0, 's_v2': 3.5},
{'g1': 1, 'g2': 9, 's_v1': 2.0, 's_v2': 3.0},
{'g1': 2, 'g2': 8, 's_v1': 6.0, 's_v2': 8.0} ]
It sometimes helps me to think about data structures by drawing them out.
In this case, you have what is effectively a two-dimensional table:
g1 g2 s_v1 s_v2
=== === ===== ====
1 8 5.0 3.5
1 9 2.0 3.0
2 8 6.0 8.0
Nice and simple. But the result you want is a bit more complex -- it's a
dict of dicts of dicts:
{1: {'s_v1': 7.0, 's_v2': 6.5,
'g2': {8: {'s_v1': 5.0, 's_v2': 3.5},
9: {'s_v1': 2.0, 's_v2': 3.0}
}},
2: {'s_v1': 6.0, 's_v2': 8.0,
'g2': {8: {'s_v1' : 6.0, 's_v2': 8.0}
}}}
(I quote from the Zen of Python: "Flat is better than nested." Hmmm.)
which is equivalent to a *four* dimensional table, which is a bit hard to
write out :)
Here's a two-dimensional projection of a single slice with key = 1:
s_v1 s_v2 g2
===== ===== =====
7.0 6.5 | s_v1 s_v2
---------------
8 | 5.0 3.5
9 | 2.0 3.0
Does this help you to either (1) redesign your data structures, or (2)
work out how to go from there?
[...]
> I was looking for a solution that would let me do that kind of grouping
> with variable lists of 2) and 3) i.e. having also 'g3' as grouping
> element so the 'g2' dicts could also have their own "subgroup" and be
> even more nested then. I was trying something with itertools.groupby and
> updating nested dicts, but as i was writing the code it started to feel
> too verbose to me :/
I don't think groupby is the tool you want. It groups *consecutive* items
in sequences:
>>> from itertools import groupby
>>> for key, it in groupby([1,1,1,2,3,4,3,3,3,5,1]):
... print(key, list(it))
...
1 [1, 1, 1]
2 [2]
3 [3]
4 [4]
3 [3, 3, 3]
5 [5]
1 [1]
Except for the name, I don't see any connection between this and what you
want to do.
The approach I would take is a top-down approach:
dicts = [ ... ] # list of dicts, as above.
result = {}
for d in dicts:
# process each dict in isolation
temp = process(d)
merge(result, temp)
merge() hopefully should be straight forward, and process only needs to
look at one dict at a time.
--
Steven
More information about the Python-list
mailing list