<div>PyToolz, Pandas, Dask .groupby()</div><div><br></div>toolz.itertoolz.groupby does this succinctly without any new/magical/surprising syntax.<div><br></div><div><a href="https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby" target="_blank">https://toolz.readthedocs.io/<wbr>en/latest/api.html#toolz.<wbr>itertoolz.groupby</a></div><div><br></div><div>From <a href="https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py" target="_blank">https://github.com/pytoolz/<wbr>toolz/blob/master/toolz/<wbr>itertoolz.py</a> :<br></div><div><br></div><div>"""</div><div><div>def groupby(key, seq):</div><div>    """ Group a collection by a key function</div><div>    >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']</div><div>    >>> groupby(len, names)  # doctest: +SKIP</div><div>    {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}</div><div>    >>> iseven = lambda x: x % 2 == 0</div><div>    >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8])  # doctest: +SKIP</div><div>    {False: [1, 3, 5, 7], True: [2, 4, 6, 8]}</div><div>    Non-callable keys imply grouping on a member.</div><div>    >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'},</div><div>    ...                    {'name': 'Bob', 'gender': 'M'},</div><div>    ...                    {'name': 'Charlie', 'gender': 'M'}]) # doctest:+SKIP</div><div>    {'F': [{'gender': 'F', 'name': 'Alice'}],</div><div>     'M': [{'gender': 'M', 'name': 'Bob'},</div><div>           {'gender': 'M', 'name': 'Charlie'}]}</div><div>    See Also:</div><div>        countby</div><div>    """</div><div>    if not callable(key):</div><div>        key = getter(key)</div><div>    d = collections.defaultdict(<wbr>lambda: [].append)</div><div>    for item in seq:</div><div>        d[key(item)](item)</div><div>    rv = {}</div><div>    for k, v in iteritems(d):</div><div>        rv[k] = v.__self__</div><div>    return rv</div></div><div>"""</div><div><br></div><div>If you're willing to install Pandas (and NumPy, and ...), there's pandas.DataFrame.groupby:</div><div><br></div><div><a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" target="_blank">https://pandas.pydata.org/<wbr>pandas-docs/stable/generated/<wbr>pandas.DataFrame.groupby.html</a></div><div><br></div><div><a href="https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/core/generic.py#L6586-L6659" target="_blank">https://github.com/pandas-dev/<wbr>pandas/blob/v0.23.1/pandas/<wbr>core/generic.py#L6586-L6659</a></div><div><br></div><div><br></div><div>Dask has a different groupby implementation:</div><div><a href="https://gist.github.com/darribas/41940dfe7bf4f987eeaa#file-pandas_dask_test-ipynb" target="_blank">https://gist.github.com/<wbr>darribas/41940dfe7bf4f987eeaa#<wbr>file-pandas_dask_test-ipynb</a></div><div><br></div><div><a href="https://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.groupby">https://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.groupby</a></div><div><br><br>On Thursday, June 28, 2018, Chris Barker via Python-ideas <<a href="mailto:python-ideas@python.org" target="_blank">python-ideas@python.org</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jun 28, 2018 at 8:25 AM, Nicolas Rolin <span dir="ltr"><<a href="mailto:nicolas.rolin@tiime.fr" target="_blank">nicolas.rolin@tiime.fr</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I use list and dict comprehension a lot, and a problem I often have is to do the equivalent of a group_by operation (to use sql terminology).<br></div></blockquote><div><br></div><div>I don't know from SQL, so "group by" doesn't mean anything to me, but this:</div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">For example if I have a list of tuples (student, school) and I want to have the list of students by school the only option I'm left with is to write<br><br>    student_by_school = defaultdict(list)<br>    for student, school in student_school_list:<br>        student_by_school[school].appe<wbr>nd(student)<br></div></blockquote><div><br></div><div>seems to me that the issue here is that there is not way to have a "defaultdict comprehension"</div><div><br></div><div>I can't think of syntactically clean way to make that possible, though.</div><div> <br></div><div>Could itertools.groupby help here? It seems to work, but boy! it's ugly:</div><div><br></div><div>




<span></span>





<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span style="color:rgb(52,163,39)">In [</span><span style="color:rgb(46,231,33)"><b>45</b></span><span style="color:rgb(52,163,39)">]: </span><span>student_school_list</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(178,54,34);background-color:rgb(255,255,255)"><span>Out[</span><span style="color:rgb(255,59,30)"><b>45</b></span><span>]:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span>[('Fred', 'SchoolA'),</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span><span> </span>('Bob', 'SchoolB'),</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span><span> </span>('Mary', 'SchoolA'),</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span><span> </span>('Jane', 'SchoolB'),</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span><span> </span>('Nancy', 'SchoolC')]</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255);min-height:18px"><span></span><br></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span style="color:rgb(52,163,39)">In [</span><span style="color:rgb(46,231,33)"><b>46</b></span><span style="color:rgb(52,163,39)">]: </span><span>{a:[t[</span><span style="color:rgb(52,163,39)">0</span><span>] </span><span style="color:rgb(52,163,39)"><b>for</b></span><span> t </span><span style="color:rgb(208,59,255)"><b>in</b></span><span> b] </span><span style="color:rgb(52,163,39)"><b>for</b></span><span> a,b </span><span style="color:rgb(208,59,255)"><b>in</b></span><span> groupby(</span><span style="color:rgb(52,163,39)">sorted</span><span>(student_school_<wbr>list, key=</span><span style="color:rgb(52,163,39)"><b>lambda</b></span><span> t: t[</span><span style="color:rgb(52,163,39)">1</span><span>]), key=</span><span style="color:rgb(52,163,39)"><b>lambda</b></span><span> t: t[</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...: 1</span><span style="color:rgb(0,0,0)">])}</span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(52,163,39);background-color:rgb(255,255,255)"><span><span>    </span>...:<span> </span></span></p>
<p style="margin:0px;font-weight:normal;font-stretch:normal;font-size:15px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);background-color:rgb(255,255,255)"><span style="color:rgb(178,54,34)">Out[</span><span style="color:rgb(255,59,30)"><b>46</b></span><span style="color:rgb(178,54,34)">]: </span><span>{'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'], 'SchoolC': ['Nancy']}</span></p>


<br></div><div><br></div><div>-CHB</div><div><br></div><div><br></div></div>-- <br><div data-smartmail="gmail_signature"><br>Christopher Barker, Ph.D.<br>Oceanographer<br><br>Emergency Response Division<br>NOAA/NOS/OR&R            (206) 526-6959   voice<br>7600 Sand Point Way NE   (206) 526-6329   fax<br>Seattle, WA  98115       (206) 526-6317   main reception<br><br><a href="mailto:Chris.Barker@noaa.gov" target="_blank">Chris.Barker@noaa.gov</a></div>
</div></div>
</blockquote></div>