itertools: problem with nested groupby, list()

Nico Schlömer nico.schloemer at gmail.com
Tue May 4 07:43:18 EDT 2010


> I'd try to avoid copying the list and instead just iterate over it:
>
>
>    def iterate_by_key(l, key):
>        for d in l:
>            try:
>                yield l[key]
>            except:
>                continue

Hm, that won't work for me b/c I don't know all the keys beforehand. I
could certainly do a unique(list.keys()) or something like that
beforehand, but I guess this does away with the speed advantage.

> Since your operation not only iterates over a list but first sorts it, it
> requires a modification which must not happen while iterating. You work
> around this by copying the list first.

So when I go like

for item in list:
    item[1].sort()

I actually modify *list*? I didn't realize that; I thought it'd just
be a copy of it. Anyway, I could just try

for item in list:
    newitem = sorted( item[1] )

in that case.

> which is a no-no. Create a custom iterator function (IIRC they are
> called "generators") and you should be fine.

I'll look into this, thanks for the hint.

Cheers,
Nico


On Tue, May 4, 2010 at 12:46 PM, Ulrich Eckhardt
<eckhardt at satorlaser.com> wrote:
> Nico Schlömer wrote:
>> I ran into a bit of an unexpected issue here with itertools, and I
>> need to say that I discovered itertools only recently, so maybe my way
>> of approaching the problem is "not what I want to do".
>>
>> Anyway, the problem is the following:
>> I have a list of dictionaries, something like
>>
>> [ { "a": 1, "b": 1, "c": 3 },
>>   { "a": 1, "b": 1, "c": 4 },
>>   ...
>> ]
>>
>> and I'd like to iterate through all items with, e.g., "a":1. What I do
>> is sort and then groupby,
>>
>> my_list.sort( key=operator.itemgetter('a') )
>> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>>
>> and then just very simply iterate over my_list_grouped,
>>
>> for my_item in my_list_grouped:
>>     # do something with my_item[0], my_item[1]
>
> I'd try to avoid copying the list and instead just iterate over it:
>
>
>    def iterate_by_key(l, key):
>        for d in l:
>            try:
>                yield l[key]
>            except:
>                continue
>
> Note that you could also ask the dictionary first if it has the key, but I'm
> told this way is even faster since it only requires a single lookup
> attempt.
>
>
>> Now, inside this loop I'd like to again iterate over all items with
>> the same 'b'-value -- no problem, just do the above inside the loop:
>>
>> for my_item in my_list_grouped:
>>         # group by keyword "b"
>>         my_list2 = list( my_item[1] )
>>         my_list2.sort( key=operator.itemgetter('b') )
>>         my_list_grouped = itertools.groupby( my_list2,
>> operator.itemgetter('b') )
>>         for e in my_list_grouped:
>>             # do something with e[0], e[1]
>>
>> That seems to work all right.
>
> Since your operation not only iterates over a list but first sorts it, it
> requires a modification which must not happen while iterating. You work
> around this by copying the list first.
>
>> Now, the problem occurs when this all is wrapped into an outer loop, such
>> as
>>
>> for k in [ 'first pass', 'second pass' ]:
>>     for my_item in my_list_grouped:
>>     # bla, the above
>>
>> To be able to iterate more than once through my_list_grouped, I have
>> to convert it into a list first, so outside all loops, I go like
>>
>> my_list.sort( key=operator.itemgetter('a') )
>> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>> my_list_grouped = list( my_list_grouped )
>>
>> This, however, makes it impossible to do the inner sort and
>> groupby-operation; you just get the very first element, and that's it.
>
> I believe that you are doing a modifying operation inside the the iteration,
> which is a no-no. Create a custom iterator function (IIRC they are
> called "generators") and you should be fine. Note that this should also
> perform better since copying and sorting are not exactly for free, though
> you may not notice that with small numbers of objects.
>
> Uli
>
> --
> Sator Laser GmbH
> Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list