[Python-ideas] Allow a group by operation for dict comprehension

Thu Jun 28 18:17:01 EDT 2018

On Thu, Jun 28, 2018 at 1:34 PM, David Mertz <mertz at gnosis.cx> wrote:

> I'd add one more option. You want something that behaves like SQL. Right
> in the standard library is sqlite3, and you can create an in-memory DB to
> hope the data you expect to group.
>

There are also packages designed to make DB-style queries easier.

Here's one I found with a quick google.

-CHB

> On Thu, Jun 28, 2018, 3:48 PM Wes Turner <wes.turner at gmail.com> wrote:
>
>> PyToolz, Pandas, Dask .groupby()
>>
>> toolz.itertoolz.groupby does this succinctly without any
>> new/magical/surprising syntax.
>>
>> https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby
>>
>> From https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py :
>>
>> """
>> def groupby(key, seq):
>>     """ Group a collection by a key function
>>     >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']
>>     >>> groupby(len, names)  # doctest: +SKIP
>>     {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}
>>     >>> iseven = lambda x: x % 2 == 0
>>     >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8])  # doctest: +SKIP
>>     {False: [1, 3, 5, 7], True: [2, 4, 6, 8]}
>>     Non-callable keys imply grouping on a member.
>>     >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'},
>>     ...                    {'name': 'Bob', 'gender': 'M'},
>>     ...                    {'name': 'Charlie', 'gender': 'M'}]) #
>> doctest:+SKIP
>>     {'F': [{'gender': 'F', 'name': 'Alice'}],
>>      'M': [{'gender': 'M', 'name': 'Bob'},
>>            {'gender': 'M', 'name': 'Charlie'}]}
>>     See Also:
>>         countby
>>     """
>>     if not callable(key):
>>         key = getter(key)
>>     d = collections.defaultdict(lambda: [].append)
>>     for item in seq:
>>         d[key(item)](item)
>>     rv = {}
>>     for k, v in iteritems(d):
>>         rv[k] = v.__self__
>>     return rv
>> """
>>
>> If you're willing to install Pandas (and NumPy, and ...), there's
>> pandas.DataFrame.groupby:
>>
>> https://pandas.pydata.org/pandas-docs/stable/generated/
>> pandas.DataFrame.groupby.html
>>
>> https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/
>> core/generic.py#L6586-L6659
>>
>>
>> Dask has a different groupby implementation:
>> https://gist.github.com/darribas/41940dfe7bf4f987eeaa#
>> file-pandas_dask_test-ipynb
>>
>> https://dask.pydata.org/en/latest/dataframe-api.html#
>> dask.dataframe.DataFrame.groupby
>>
>>
>> On Thursday, June 28, 2018, Chris Barker via Python-ideas <
>> python-ideas at python.org> wrote:
>>
>>> On Thu, Jun 28, 2018 at 8:25 AM, Nicolas Rolin <nicolas.rolin at tiime.fr>
>>> wrote:
>>>>
>>>> I use list and dict comprehension a lot, and a problem I often have is
>>>> to do the equivalent of a group_by operation (to use sql terminology).
>>>>
>>>
>>> I don't know from SQL, so "group by" doesn't mean anything to me, but
>>> this:
>>>
>>>
>>>> For example if I have a list of tuples (student, school) and I want to
>>>> have the list of students by school the only option I'm left with is to
>>>> write
>>>>
>>>>     student_by_school = defaultdict(list)
>>>>     for student, school in student_school_list:
>>>>         student_by_school[school].append(student)
>>>>
>>>
>>> seems to me that the issue here is that there is not way to have a
>>> "defaultdict comprehension"
>>>
>>> I can't think of syntactically clean way to make that possible, though.
>>>
>>> Could itertools.groupby help here? It seems to work, but boy! it's ugly:
>>>
>>> In [*45*]: student_school_list
>>>
>>> Out[*45*]:
>>>
>>> [('Fred', 'SchoolA'),
>>>
>>>  ('Bob', 'SchoolB'),
>>>
>>>  ('Mary', 'SchoolA'),
>>>
>>>  ('Jane', 'SchoolB'),
>>>
>>>  ('Nancy', 'SchoolC')]
>>>
>>>
>>> In [*46*]: {a:[t[0] *for* t *in* b] *for* a,b *in* groupby(sorted
>>> (student_school_list, key=*lambda* t: t[1]), key=*lambda* t: t[
>>>
>>>     ...: 1])}
>>>
>>>     ...:
>>>
>>>     ...:
>>>
>>>     ...:
>>>
>>>     ...:
>>>
>>>     ...:
>>>
>>>     ...:
>>>
>>>     ...:
>>>
>>> Out[*46*]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'],
>>> 'SchoolC': ['Nancy']}
>>>
>>>
>>> -CHB
>>>
>>>
>>> --
>>>
>>> Christopher Barker, Ph.D.
>>> Oceanographer
>>>
>>> Emergency Response Division
>>> NOAA/NOS/OR&R            (206) 526-6959   voice
>>> 7600 Sand Point Way NE
>>> <https://maps.google.com/?q=7600+Sand+Point+Way+NE&entry=gmail&source=g>
>>>   (206) 526-6329   fax
>>> Seattle, WA  98115       (206) 526-6317   main reception
>>>
>>> Chris.Barker at noaa.gov
>>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/665ce04c/attachment-0001.html>