itertools.groupby usage to get structured data
Slafs
slafs.e at gmail.com
Sat Feb 5 05:34:26 EST 2011
On 5 Lut, 05:58, Paul Rubin <no.em... at nospam.invalid> wrote:
> Slafs <slaf... at gmail.com> writes:
> > What i want to have is:
> > a "big" nested dictionary with 'g1' values as 1st level keys and a
> > dictionary of aggregates and "subgroups" in it....
>
> > I was looking for a solution that would let me do that kind of
> > grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> > grouping element so the 'g2' dicts could also have their own
> > "subgroup" and be even more nested then.
> > I was trying something with itertools.groupby and updating nested
> > dicts, but as i was writing the code it started to feel too verbose to
> > me :/
>
> > Do You have any hints maybe? because i'm kind of stucked :/
>
> I'm not sure I understood the problem and it would help if you gave
> sample data with the deeper nesting that you describe. But the
> following messy code matches the sample that you did give:
>
> from pprint import pprint
> from itertools import groupby
>
> x1 = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
> { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
> { 'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}
> ]
> x2 = ['g1', 'g2']
> x3 = ['s_v1', 's_v2']
>
> def agg(xdata, group_keys, agg_keys):
> if not group_keys:
> return {}
> k0, ks = group_keys[0], group_keys[1:]
> r = {}
> def gk(d): return d[k0]
> for k, g in groupby(sorted(xdata, key=gk), gk):
> gs = list(g)
> aggs = dict((ak,sum(d[ak] for d in gs)) for ak in agg_keys)
> r[k] = aggs
> if ks:
> r[k][ks[0]] = agg(gs,group_keys[1:], agg_keys)
> return r
>
> pprint (agg(x1, x2, x3))
Thank you both Steven and Paul for your replies.
@Steven:
> Perhaps you should consider backing up and staring from somewhere else
> with different input data, or changing the requirements. Just a thought.
I think it's not the issue. The data as you noticed i well structured
(as a table for instance) and I don't think I can go better than that.
> I don't think groupby is the tool you want. It groups *consecutive* items
> in sequences:
I was using groupby just like in Paul's code.
@Paul:
OMG. I think this is it! (getting my jaw from the floor...)
The funny part is that I was kind of close to this solution ;). I was
considering the use of recursion for this.
Thank You so much!
More information about the Python-list
mailing list