<div dir="ltr"><div><div><div><div><div>I noticed recently that *all* examples for collection.defaultdict (<a href="https://docs.python.org/3.7/library/collections.html#collections.defaultdict" target="_blank">https://docs.python.org/3.7/l<wbr>ibrary/collections.html#collec<wbr>tions.defaultdict</a>) are cases of grouping (for an int, a list and a set) from an iterator with a key, value output. <br></div><div><br></div><div>I
wondered how common those constructions were, and what are defaultdict
used for else. So I took a little dive into a few libs to see it (std
lib, pypy, pandas, tensorflow, ..), and I saw essentially :<br></div>A) basic cases of "grouping" with a simple for loop and a default_dict[key].append(value<wbr>). I saw many kind of default factory utilized, with list, int, set, dict, and even defaultdict(list). ex : <a href="https://frama.link/UtNqvpvb">https://frama.link/UtNqvpvb</a>, <a href="https://frama.link/o3Hb3-4U">https://frama.link/o3Hb3-4U</a>, <a href="https://frama.link/dw92yJ1q">https://frama.link/dw92yJ1q</a>, <a href="https://frama.link/1Gqoa7WM">https://frama.link/1Gqoa7WM</a>, <a href="https://frama.link/bWswbHsU">https://frama.link/bWswbHsU</a>, <a href="https://frama.link/SZh2q8pS">https://frama.link/SZh2q8pS</a><br></div><div>B) cases of grouping, but where the for loop used was alimenting more than
one "grouper". pretty annoying if we want to group something. ex: <a href="https://frama.link/Db-Ny49a">https://frama.link/Db-Ny49a</a>, <a href="https://frama.link/bZakUR33">https://frama.link/bZakUR33</a>, <a href="https://frama.link/MwJFqh5o">https://frama.link/MwJFqh5o</a>, <br></div>C) classes attributes initialization (grouping is done by repeatably
calling a function, so any grouping constructor will be useless here). ex : <a href="https://frama.link/GoGWuQwR">https://frama.link/GoGWuQwR</a>, <a href="https://frama.link/BugcS8wU">https://frama.link/BugcS8wU</a></div><div>D) Sometimes you just want to defautdict inside a defauldict inside a dict and just have fun : <a href="https://frama.link/asBNLr1g">https://frama.link/asBNLr1g</a>, <a href="https://frama.link/8j7gzfA5">https://frama.link/8j7gzfA5</a><br></div><br></div></div><div>From
what I saw, the most useful would be to add method to a defaultdict to
fill it from an iterable, and using a grouping method adapted to the
default_factor (so __add__ for list, int and str, add for set, update
for dict and proably __add__ for anything else)</div><div><br></div><div>A sample code would be :</div><div><br></div><div>from collections import defaultdict<br>class groupingdict(defaultdict):<br> def group_by_iterator(self, iterator):<br> empty_element = self.default_factory()<br> if hasattr(empty_element, "__add__"):<br> for key, element in iterator:<br> self[key] += element<br> elif hasattr(empty_element, "update"):<br> for key, element in iterator:<br> self[key].update(element)<br> elif hasattr(empty_element, "add"):<br> for key, element in iterator:<br> self[key].add(element)<br> else:<br> raise TypeError('default factory does not support iteration')</div><div> return self</div><div><br></div><div>So that for example :</div><div>>groupingdict(dict).group_by_iterator(<br> (grouping_key, a_dict) for grouping_key, a_dict in [<br> (1, {'a': 'c'}), <br> (1, {'b': 'f'}), <br> (1, {'a': 'e'}), <br> (2, {'a': 'e'})<br> ]<br>)</div><div>returns <br></div><div><pre style="box-sizing:border-box;overflow:auto;font-family:monospace;font-size:14px;display:block;padding:0px;margin:0px;line-height:inherit;word-break:break-all;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px none;border-radius:0px;white-space:pre-wrap;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial">>groupingdict(dict, {1: {'a': 'e', 'b': 'f'}, 2: {'a': 'e'}})</pre></div><div><br></div><div>My implementation is garbage and There should be 2 method, one returning the object and one modifing it, but I think it gives more leeway than just a function returning a dict<br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-07-13 7:11 GMT+02:00 Chris Barker via Python-ideas <span dir="ltr"><<a href="mailto:python-ideas@python.org" target="_blank">python-ideas@python.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On Mon, Jul 9, 2018 at 5:55 PM, Franklin? Lee <span dir="ltr"><<a href="mailto:leewangzhong+python@gmail.com" target="_blank">leewangzhong+python@gmail.com</a><wbr>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>>> - The storage container.<br>
><br>
><br>
> so this means you'r passing in a full set of storage containers? I'm a vit<br>
> confused by that -- if they might be pre-populated, then they would need to<br>
> be instance,s an you'd need to have one for every key -- how would you know<br>
> in advance aht you needed???<br>
<br>
</span>No, I mean the mapping (outer) container. For example, I can pass in<br>
an empty OrderedDict, or a dict that already contained some groups<br>
from a previous call to the grouping function.<br></blockquote><div><br></div></span><div>Sure -- that's what my prototype does if you pass a Mapping in (or use .update() )</div><div><br></div><div>why not?</div><div><br></div><div>-CHB</div><div><br></div></div><span class="">-- <br><div data-smartmail="gmail_signature"><br>Christopher Barker, Ph.D.<br>Oceanographer<br><br>Emergency Response Division<br>NOAA/NOS/OR&R (206) 526-6959 voice<br>7600 Sand Point Way NE (206) 526-6329 fax<br>Seattle, WA 98115 (206) 526-6317 main reception<br><br><a href="mailto:Chris.Barker@noaa.gov" target="_blank">Chris.Barker@noaa.gov</a></div>
</span></div></div>
<br>______________________________<wbr>_________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/<wbr>codeofconduct/</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><p>--<b><br>Nicolas Rolin</b> | Data Scientist<br>+ 33 631992617 - nicolas<a href="mailto:prenom.nom@tiime.fr" rel="nofollow" target="_blank">.rolin@tiime.fr</a></p><p><span></span><span><font color="#888888"><img src="https://docs.google.com/uc?export=download&id=0B5gEmxojZz7NUklic0RTMDVXd0E&revid=0B5gEmxojZz7NYytTZzQ3Q2t6d0xYZGZVSkljV3RCNGxZRENVPQ" width="96" height="28"> </font></span><br><i>15 rue Auber, </i><i>75009 Paris</i><br><i><a href="http://www.tiime.fr" rel="nofollow" target="_blank">www.tiime.fr</a></i></p></div></div></div></div></div></div>
</div>