Hi,

I use list and dict comprehension a lot, and a problem I often have is to do the equivalent of a group_by operation (to use sql terminology).

For example if I have a list of tuples (student, school) and I want to have the list of students by school the only option I'm left with is to write

    student_by_school = defaultdict(list)
    for student, school in student_school_list:
        student_by_school[school].append(student)

What I would expect would be a syntax with comprehension allowing me to write something along the lines of:

    student_by_school = {group_by(school): student for school, student in student_school_list}

or any other syntax that allows me to regroup items from an iterable.


Small FAQ:

Q: Why include something in comprehensions when you can do it in a small number of lines ?

A: A really appreciable part of the list and dict comprehension is the fact that it allows the developer to be really explicit about what he wants to do at a given line.
If you see a comprehension, you know that the developer wanted to have an iterable and not have any side effect other than depleting the iterator (if he respects reasonable code guidelines).
Initializing an object and doing a for loop to construct it is both too long and not explicit enough about what is intended.
It should be reserved for intrinsically complex operations, not one of the base operation one can want to do with lists and dicts.


Q: Why group by in particular ?

A: If we take SQL queries (https://en.wikipedia.org/wiki/SQL_syntax#Queries) as a reasonable way of seeing how people need to manipulate data on a day-to-day basis, we can see that dict comprehensions already covers most of the base operations, the only missing operations being group by and having.

Q: Why not use it on list with syntax such as
    student_by_school = [
        school, student
        for school, student in student_school_list
        group by school
    ]
?

A: It would create either a discrepancy with iterators or a perhaps misleading semantic (the one from itertools.groupby, which requires the iterable to be sorted in order to be useful).
Having the option do do it with a dict remove any ambiguity and should be enough to cover most "group by" applications.


Examples:

    edible_list = [('fruit', 'orange'), ('meat', 'eggs'), ('meat', 'spam'), ('fruit', 'apple'), ('vegetable', 'fennel'), ('fruit', 'pineapple'), ('fruit', 'pineapple'), ('vegetable', 'carrot')]
    edible_list_by_food_type = {group_by(food_type): edible for food_type, edible in edible_list}

    print(edible_list_by_food_type)
   {'fruit': ['orange', 'pineapple'], 'meat': ['eggs', 'spam'], 'vegetable': ['fennel', 'carrot']}


   bank_transactions = [200.0, -357.0, -9.99, -15.6, 4320.0, -1200.0]
   splited_bank_transactions = {group_by('credit' if amount > 0 else 'debit'): amount for amount in bank_transactions}

   print(splited_bank_transactions)
   {'credit': [200.0, 4320.0], 'debit': [-357.0, -9.99, -15.6, -1200.0]}



--
Nicolas Rolin