I agree with these recommendations. There are excellent 3rd party tools that do what you want. This is way too much to try to shoehorn into a comprehension.

I'd add one more option. You want something that behaves like SQL. Right in the standard library is sqlite3, and you can create an in-memory DB to hope the data you expect to group.

PyToolz, Pandas, Dask .groupby()

toolz.itertoolz.groupby does this succinctly without any new/magical/surprising syntax.


From https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py :

def groupby(key, seq):
    """ Group a collection by a key function
    >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']
    >>> groupby(len, names)  # doctest: +SKIP
    {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}
    >>> iseven = lambda x: x % 2 == 0
    >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8])  # doctest: +SKIP
    {False: [1, 3, 5, 7], True: [2, 4, 6, 8]}
    Non-callable keys imply grouping on a member.
    >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'},
    ...                    {'name': 'Bob', 'gender': 'M'},
    ...                    {'name': 'Charlie', 'gender': 'M'}]) # doctest:+SKIP
    {'F': [{'gender': 'F', 'name': 'Alice'}],
     'M': [{'gender': 'M', 'name': 'Bob'},
           {'gender': 'M', 'name': 'Charlie'}]}
    See Also:
    if not callable(key):
        key = getter(key)
    d = collections.defaultdict(lambda: [].append)
    for item in seq:
    rv = {}
    for k, v in iteritems(d):
        rv[k] = v.__self__
    return rv

If you're willing to install Pandas (and NumPy, and ...), there's pandas.DataFrame.groupby:



Dask has a different groupby implementation:


I use list and dict comprehension a lot, and a problem I often have is to do the equivalent of a group_by operation (to use sql terminology).

I don't know from SQL, so "group by" doesn't mean anything to me, but this:
For example if I have a list of tuples (student, school) and I want to have the list of students by school the only option I'm left with is to write

    student_by_school = defaultdict(list)
    for student, school in student_school_list:

seems to me that the issue here is that there is not way to have a "defaultdict comprehension"

I can't think of syntactically clean way to make that possible, though.
Could itertools.groupby help here? It seems to work, but boy! it's ugly:

In [45]: student_school_list


[('Fred', 'SchoolA'),

 ('Bob', 'SchoolB'),

 ('Mary', 'SchoolA'),

 ('Jane', 'SchoolB'),

 ('Nancy', 'SchoolC')]

In [46]: {a:[t[0] for t in b] for a,b in groupby(sorted(student_school_list, key=lambda t: t[1]), key=lambda t: t[

    ...: 1])}








Out[46]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'], 'SchoolC': ['Nancy']}



