[Python-ideas] Fwd: grouping / dict of lists

Nicolas Rolin nicolas.rolin at tiime.fr
Fri Jul 13 09:17:17 EDT 2018


I noticed recently that *all* examples for collection.defaultdict (
https://docs.python.org/3.7/library/collections.html#collections.defaultdict)
are cases of grouping (for an int, a list and a set) from an iterator with
a key, value output.

I wondered how common those constructions were, and what are defaultdict
used for else. So I took a little dive into a few libs to see it (std lib,
pypy, pandas, tensorflow, ..), and I saw essentially :
A) basic cases of "grouping" with a simple for loop and a
default_dict[key].append(value). I saw many kind of default factory
utilized, with list, int, set, dict, and even defaultdict(list). ex :
https://frama.link/UtNqvpvb, https://frama.link/o3Hb3-4U,
https://frama.link/dw92yJ1q, https://frama.link/1Gqoa7WM,
https://frama.link/bWswbHsU, https://frama.link/SZh2q8pS
B) cases of grouping, but where the for loop used was alimenting more than
one "grouper". pretty annoying if we want to group something. ex:
https://frama.link/Db-Ny49a, https://frama.link/bZakUR33,
https://frama.link/MwJFqh5o,
C) classes attributes initialization (grouping is done by repeatably
calling a function, so any grouping constructor will be useless here). ex :
https://frama.link/GoGWuQwR, https://frama.link/BugcS8wU
D) Sometimes you just want to defautdict inside a defauldict inside a dict
and just have fun : https://frama.link/asBNLr1g,
https://frama.link/8j7gzfA5

>From what I saw, the most useful would be to add method to a defaultdict to
fill it from an iterable, and using a grouping method adapted to the
default_factor (so __add__ for list, int and str, add for set, update for
dict and proably __add__ for anything else)

A sample code would be :

from collections import defaultdict
class groupingdict(defaultdict):
    def group_by_iterator(self, iterator):
        empty_element = self.default_factory()
        if hasattr(empty_element, "__add__"):
            for key, element in iterator:
                self[key] += element
        elif hasattr(empty_element, "update"):
            for key, element in iterator:
                self[key].update(element)
        elif hasattr(empty_element, "add"):
            for key, element in iterator:
                self[key].add(element)
        else:
            raise TypeError('default factory does not support iteration')
        return self

So that for example :
>groupingdict(dict).group_by_iterator(
    (grouping_key, a_dict) for grouping_key, a_dict in [
        (1, {'a': 'c'}),
        (1, {'b': 'f'}),
        (1, {'a': 'e'}),
        (2, {'a': 'e'})
    ]
)
returns

>groupingdict(dict, {1: {'a': 'e', 'b': 'f'}, 2: {'a': 'e'}})


My implementation is garbage and There should be 2 method, one returning
the object and one modifing it, but I think it gives more leeway than just
a function returning a dict


2018-07-13 7:11 GMT+02:00 Chris Barker via Python-ideas <
python-ideas at python.org>:

> On Mon, Jul 9, 2018 at 5:55 PM, Franklin? Lee <
> leewangzhong+python at gmail.com> wrote:
>
>> >> - The storage container.
>> >
>> >
>> > so this means you'r passing in a full set of storage containers? I'm a
>> vit
>> > confused by that -- if they might be pre-populated, then they would
>> need to
>> > be instance,s an you'd need to have one for every key -- how would you
>> know
>> > in advance aht you needed???
>>
>> No, I mean the mapping (outer) container. For example, I can pass in
>> an empty OrderedDict, or a dict that already contained some groups
>> from a previous call to the grouping function.
>>
>
> Sure -- that's what my prototype does if you pass a Mapping in (or use
> .update() )
>
> why not?
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>


-- 

--
*Nicolas Rolin* | Data Scientist
+ 33 631992617 - nicolas.rolin at tiime.fr <prenom.nom at tiime.fr>


*15 rue Auber, **75009 Paris*
*www.tiime.fr <http://www.tiime.fr>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180713/e2f90ca7/attachment.html>


More information about the Python-ideas mailing list