[Python-ideas] Allow a group by operation for dict comprehension

Thu Jun 28 15:47:11 EDT 2018

PyToolz, Pandas, Dask .groupby()

toolz.itertoolz.groupby does this succinctly without any
new/magical/surprising syntax.

https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby

>From https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py :

"""
def groupby(key, seq):
    """ Group a collection by a key function
    >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']
    >>> groupby(len, names)  # doctest: +SKIP
    {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}
    >>> iseven = lambda x: x % 2 == 0
    >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8])  # doctest: +SKIP
    {False: [1, 3, 5, 7], True: [2, 4, 6, 8]}
    Non-callable keys imply grouping on a member.
    >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'},
    ...                    {'name': 'Bob', 'gender': 'M'},
    ...                    {'name': 'Charlie', 'gender': 'M'}]) #
doctest:+SKIP
    {'F': [{'gender': 'F', 'name': 'Alice'}],
     'M': [{'gender': 'M', 'name': 'Bob'},
           {'gender': 'M', 'name': 'Charlie'}]}
    See Also:
        countby
    """
    if not callable(key):
        key = getter(key)
    d = collections.defaultdict(lambda: [].append)
    for item in seq:
        d[key(item)](item)
    rv = {}
    for k, v in iteritems(d):
        rv[k] = v.__self__
    return rv
"""

If you're willing to install Pandas (and NumPy, and ...), there's
pandas.DataFrame.groupby:

https://pandas.pydata.org/pandas-docs/stable/generated/
pandas.DataFrame.groupby.html

https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/
core/generic.py#L6586-L6659

Dask has a different groupby implementation:
https://gist.github.com/darribas/41940dfe7bf4f987eeaa#
file-pandas_dask_test-ipynb

https://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.groupby

On Thursday, June 28, 2018, Chris Barker via Python-ideas <
python-ideas at python.org> wrote:

> On Thu, Jun 28, 2018 at 8:25 AM, Nicolas Rolin <nicolas.rolin at tiime.fr>
> wrote:
>>
>> I use list and dict comprehension a lot, and a problem I often have is to
>> do the equivalent of a group_by operation (to use sql terminology).
>>
>
> I don't know from SQL, so "group by" doesn't mean anything to me, but this:
>
>
>> For example if I have a list of tuples (student, school) and I want to
>> have the list of students by school the only option I'm left with is to
>> write
>>
>>     student_by_school = defaultdict(list)
>>     for student, school in student_school_list:
>>         student_by_school[school].append(student)
>>
>
> seems to me that the issue here is that there is not way to have a
> "defaultdict comprehension"
>
> I can't think of syntactically clean way to make that possible, though.
>
> Could itertools.groupby help here? It seems to work, but boy! it's ugly:
>
> In [*45*]: student_school_list
>
> Out[*45*]:
>
> [('Fred', 'SchoolA'),
>
>  ('Bob', 'SchoolB'),
>
>  ('Mary', 'SchoolA'),
>
>  ('Jane', 'SchoolB'),
>
>  ('Nancy', 'SchoolC')]
>
>
> In [*46*]: {a:[t[0] *for* t *in* b] *for* a,b *in* groupby(sorted
> (student_school_list, key=*lambda* t: t[1]), key=*lambda* t: t[
>
>     ...: 1])}
>
>     ...:
>
>     ...:
>
>     ...:
>
>     ...:
>
>     ...:
>
>     ...:
>
>     ...:
>
> Out[*46*]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'],
> 'SchoolC': ['Nancy']}
>
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/9417e269/attachment-0001.html>