itertools: problem with nested groupby, list()
Peter Otten
__peter__ at web.de
Tue May 4 10:10:55 EDT 2010
Nico Schlömer wrote:
> Hi,
>
> I ran into a bit of an unexpected issue here with itertools, and I
> need to say that I discovered itertools only recently, so maybe my way
> of approaching the problem is "not what I want to do".
>
> Anyway, the problem is the following:
> I have a list of dictionaries, something like
>
> [ { "a": 1, "b": 1, "c": 3 },
> { "a": 1, "b": 1, "c": 4 },
> ...
> ]
>
> and I'd like to iterate through all items with, e.g., "a":1. What I do
> is sort and then groupby,
>
> my_list.sort( key=operator.itemgetter('a') )
> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>
> and then just very simply iterate over my_list_grouped,
>
> for my_item in my_list_grouped:
> # do something with my_item[0], my_item[1]
>
> Now, inside this loop I'd like to again iterate over all items with
> the same 'b'-value -- no problem, just do the above inside the loop:
>
> for my_item in my_list_grouped:
> # group by keyword "b"
> my_list2 = list( my_item[1] )
> my_list2.sort( key=operator.itemgetter('b') )
> my_list_grouped = itertools.groupby( my_list2,
> operator.itemgetter('b') )
> for e in my_list_grouped:
> # do something with e[0], e[1]
>
> That seems to work all right.
>
> Now, the problem occurs when this all is wrapped into an outer loop, such
> as
>
> for k in [ 'first pass', 'second pass' ]:
> for my_item in my_list_grouped:
> # bla, the above
>
> To be able to iterate more than once through my_list_grouped, I have
> to convert it into a list first, so outside all loops, I go like
>
> my_list.sort( key=operator.itemgetter('a') )
> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> my_list_grouped = list( my_list_grouped )
>
> This, however, makes it impossible to do the inner sort and
> groupby-operation; you just get the very first element, and that's it.
>
> An example file is attached.
>
> Hints, anyone?
If you want a reusable copy of a groupby(...) it is not enough to convert it
to a list as a whole:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> items = [(1,1), (1,2), (1,3), (2,1), (2,2)]
>>> grouped_items = list(groupby(items, key=itemgetter(0))) # WRONG
>>> for run in 1, 2:
... print "run", run
... for k, g in grouped_items:
... print k, list(g)
...
run 1
1 []
2 [(2, 2)]
run 2
1 []
2 []
Instead, you have to process the groups, too:
>>> grouped_items = [(k, list(g)) for k, g in groupby(items,
key=itemgetter(0))]
>>> for run in 1, 2:
... print "run", run
... for k, g in grouped_items:
... print k, list(g)
...
run 1
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]
run 2
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]
But usually you don't bother and just run groupby() twice:
>>> for run in 1, 2:
... print "run", run
... for k, g in groupby(items, key=itemgetter(0)):
... print k, list(g)
...
run 1
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]
run 2
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]
The only caveat then is that list(items) == list(items) must hold.
Peter
More information about the Python-list
mailing list