[Python-ideas] Allow a group by operation for dict comprehension

Thu Jun 28 11:25:44 EDT 2018

Hi,

I use list and dict comprehension a lot, and a problem I often have is to
do the equivalent of a group_by operation (to use sql terminology).

For example if I have a list of tuples (student, school) and I want to have
the list of students by school the only option I'm left with is to write

    student_by_school = defaultdict(list)
    for student, school in student_school_list:
        student_by_school[school].append(student)

What I would expect would be a syntax with comprehension allowing me to
write something along the lines of:

    student_by_school = {group_by(school): student for school, student in
student_school_list}

or any other syntax that allows me to regroup items from an iterable.

Small FAQ:

Q: Why include something in comprehensions when you can do it in a small
number of lines ?

A: A really appreciable part of the list and dict comprehension is the fact
that it allows the developer to be really explicit about what he wants to
do at a given line.
If you see a comprehension, you know that the developer wanted to have an
iterable and not have any side effect other than depleting the iterator (if
he respects reasonable code guidelines).
Initializing an object and doing a for loop to construct it is both too
long and not explicit enough about what is intended.
It should be reserved for intrinsically complex operations, not one of the
base operation one can want to do with lists and dicts.

Q: Why group by in particular ?

A: If we take SQL queries (https://en.wikipedia.org/wiki/SQL_syntax#Queries)
as a reasonable way of seeing how people need to manipulate data on a
day-to-day basis, we can see that dict comprehensions already covers most
of the base operations, the only missing operations being group by and
having.

Q: Why not use it on list with syntax such as
    student_by_school = [
        school, student
        for school, student in student_school_list
        group by school
    ]
?

A: It would create either a discrepancy with iterators or a perhaps
misleading semantic (the one from itertools.groupby, which requires the
iterable to be sorted in order to be useful).
Having the option do do it with a dict remove any ambiguity and should be
enough to cover most "group by" applications.

Examples:

    edible_list = [('fruit', 'orange'), ('meat', 'eggs'), ('meat', 'spam'),
('fruit', 'apple'), ('vegetable', 'fennel'), ('fruit', 'pineapple'),
('fruit', 'pineapple'), ('vegetable', 'carrot')]
    edible_list_by_food_type = {group_by(food_type): edible for food_type,
edible in edible_list}

    print(edible_list_by_food_type)
   {'fruit': ['orange', 'pineapple'], 'meat': ['eggs', 'spam'],
'vegetable': ['fennel', 'carrot']}

   bank_transactions = [200.0, -357.0, -9.99, -15.6, 4320.0, -1200.0]
   splited_bank_transactions = {group_by('credit' if amount > 0 else
'debit'): amount for amount in bank_transactions}

   print(splited_bank_transactions)
   {'credit': [200.0, 4320.0], 'debit': [-357.0, -9.99, -15.6, -1200.0]}

-- 
Nicolas Rolin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/e9e41b40/attachment.html>