itertools: problem with nested groupby, list()
Jon Clements
joncle at googlemail.com
Tue May 4 07:52:34 EDT 2010
On 4 May, 12:36, Nico Schlömer <nico.schloe... at gmail.com> wrote:
> > Does this example help at all?
>
> Thanks, that clarified things a lot!
>
> To make it easier, let's just look at 'a' and 'b':
>
> > my_list.sort( key=itemgetter('a','b','c') )
> > for a, a_iter in groupby(my_list, itemgetter('a')):
> > print 'New A', a
> > for b, b_iter in groupby(a_iter, itemgetter('b')):
> > print '\t', 'New B', b
> > for b_data in b_iter:
> > print '\t'*3, a, b, b_data
> > print '\t', 'End B', b
> > print 'End A', a
>
> That works well, and I can wrap the outer loop in another loop without
> problems. What's *not* working, though, is having more than one pass
> on the inner loop, as in
>
> =============================== *snip* ===============================
> my_list.sort( key=itemgetter('a','b','c') )
> for a, a_iter in groupby(my_list, itemgetter('a')):
> print 'New A', a
> for pass in ['first pass', 'second pass']:
> for b, b_iter in groupby(a_iter, itemgetter('b')):
> print '\t', 'New B', b
> for b_data in b_iter:
> print '\t'*3, a, b, b_data
> print '\t', 'End B', b
> print 'End A', a
> =============================== *snap* ===============================
>
> I tried working around this by
>
> =============================== *snip* ===============================
> my_list.sort( key=itemgetter('a','b','c') )
> for a, a_iter in groupby(my_list, itemgetter('a')):
> print 'New A', a
> inner_list = list( groupby(a_iter, itemgetter('b')) )
> for pass in ['first pass', 'second pass']:
> for b, b_iter in inner_list:
> print '\t', 'New B', b
> for b_data in b_iter:
> print '\t'*3, a, b, b_data
> print '\t', 'End B', b
> print 'End A', a
> =============================== *snap* ===============================
>
> which don't work either, and I don't understand why. -- I'll look at
> Uli's comments.
>
> Cheers,
> Nico
>
> On Tue, May 4, 2010 at 1:08 PM, Jon Clements <jon... at googlemail.com> wrote:
> > On 4 May, 11:10, Nico Schlömer <nico.schloe... at gmail.com> wrote:
> >> Hi,
>
> >> I ran into a bit of an unexpected issue here with itertools, and I
> >> need to say that I discovered itertools only recently, so maybe my way
> >> of approaching the problem is "not what I want to do".
>
> >> Anyway, the problem is the following:
> >> I have a list of dictionaries, something like
>
> >> [ { "a": 1, "b": 1, "c": 3 },
> >> { "a": 1, "b": 1, "c": 4 },
> >> ...
> >> ]
>
> >> and I'd like to iterate through all items with, e.g., "a":1. What I do
> >> is sort and then groupby,
>
> >> my_list.sort( key=operator.itemgetter('a') )
> >> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>
> >> and then just very simply iterate over my_list_grouped,
>
> >> for my_item in my_list_grouped:
> >> # do something with my_item[0], my_item[1]
>
> >> Now, inside this loop I'd like to again iterate over all items with
> >> the same 'b'-value -- no problem, just do the above inside the loop:
>
> >> for my_item in my_list_grouped:
> >> # group by keyword "b"
> >> my_list2 = list( my_item[1] )
> >> my_list2.sort( key=operator.itemgetter('b') )
> >> my_list_grouped = itertools.groupby( my_list2,
> >> operator.itemgetter('b') )
> >> for e in my_list_grouped:
> >> # do something with e[0], e[1]
>
> >> That seems to work all right.
>
> >> Now, the problem occurs when this all is wrapped into an outer loop, such as
>
> >> for k in [ 'first pass', 'second pass' ]:
> >> for my_item in my_list_grouped:
> >> # bla, the above
>
> >> To be able to iterate more than once through my_list_grouped, I have
> >> to convert it into a list first, so outside all loops, I go like
>
> >> my_list.sort( key=operator.itemgetter('a') )
> >> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> >> my_list_grouped = list( my_list_grouped )
>
> >> This, however, makes it impossible to do the inner sort and
> >> groupby-operation; you just get the very first element, and that's it.
>
> >> An example file is attached.
>
> >> Hints, anyone?
>
> >> Cheers,
> >> Nico
>
> > Does this example help at all?
>
> > my_list.sort( key=itemgetter('a','b','c') )
> > for a, a_iter in groupby(my_list, itemgetter('a')):
> > print 'New A', a
> > for b, b_iter in groupby(a_iter, itemgetter('b')):
> > print '\t', 'New B', b
> > for c, c_iter in groupby(b_iter, itemgetter('c')):
> > print '\t'*2, 'New C', c
> > for c_data in c_iter:
> > print '\t'*3, a, b, c, c_data
> > print '\t'*2, 'End C', c
> > print '\t', 'End B', b
> > print 'End A', a
>
> > Jon.
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
>
Are you basically after this, then?
for a, a_iter in groupby(my_list, itemgetter('a')):
print 'New A', a
for b, b_iter in groupby(a_iter, itemgetter('b')):
b_list = list(b_iter)
for p in ['first', 'second']:
for b_data in b_list:
#whatever...
Cos that looks like it could be simplified to (untested)
for (a, b), data_iter in groupby(my_list, itemgetter('a','b')):
data = list(data) # take copy
for pass_ in ['first', 'second']:
# do something with data
But from my POV, it's almost looking like a 2-tuple key in a
defaultdict jobby.
Jon.
More information about the Python-list
mailing list