Why not write a helper function? Something like
def group_by(iterable, groupfunc, itemfunc=lambda x:x,
sortfunc=lambda x:x): # Python 2 & 3 compatible!
D = {}
for x in iterable:
group = groupfunc(x)
D[group] = D.get(group, []) + [itemfunc(x)]
if sortfunc is not None:
for group in D:
D[group] = sorted(D[group], key=sortfunc)
return D
Then:
student_list = [ ('james', 'Dublin'), ('jim', 'Cork'), ('mary',
'Cork'), ('fred', 'Dublin') ]
student_by_school = group_by(student_list, lambda stu_sch :
stu_sch[1], lambda stu_sch : stu_sch[0])
print (student_by_school)
{'Dublin': ['fred', 'james'], 'Cork': ['jim', 'mary']}
Regards
Rob Cliffe
Hi,
I use list and dict comprehension a lot, and a problem I often have is to do the equivalent of a group_by operation (to use sql terminology).
For example if I have a list of tuples (student, school) and I want to have the list of students by school the only option I'm left with is to write
student_by_school = defaultdict(list)
for student, school in student_school_list:
student_by_school[school].append(student)
What I would expect would be a syntax with comprehension allowing me to write something along the lines of:
student_by_school = {group_by(school): student for school, student in student_school_list}
or any other syntax that allows me to regroup items from an iterable.
Small FAQ:
Q: Why include something in comprehensions when you can do it in a small number of lines ?
A: A really appreciable part of the list and dict comprehension is the fact that it allows the developer to be really explicit about what he wants to do at a given line.
If you see a comprehension, you know that the developer wanted to have an iterable and not have any side effect other than depleting the iterator (if he respects reasonable code guidelines).
Initializing an object and doing a for loop to construct it is both too long and not explicit enough about what is intended.
It should be reserved for intrinsically complex operations, not one of the base operation one can want to do with lists and dicts.
Q: Why group by in particular ?
A: If we take SQL queries (https://en.wikipedia.org/wiki/SQL_syntax#Queries) as a reasonable way of seeing how people need to manipulate data on a day-to-day basis, we can see that dict comprehensions already covers most of the base operations, the only missing operations being group by and having.
Q: Why not use it on list with syntax such as
student_by_school = [
school, student
for school, student in student_school_list
group by school
]
?
A: It would create either a discrepancy with iterators or a perhaps misleading semantic (the one from itertools.groupby, which requires the iterable to be sorted in order to be useful).
Having the option do do it with a dict remove any ambiguity and should be enough to cover most "group by" applications.
Examples:
edible_list = [('fruit', 'orange'), ('meat', 'eggs'), ('meat', 'spam'), ('fruit', 'apple'), ('vegetable', 'fennel'), ('fruit', 'pineapple'), ('fruit', 'pineapple'), ('vegetable', 'carrot')]
edible_list_by_food_type = {group_by(food_type): edible for food_type, edible in edible_list}
print(edible_list_by_food_type)
{'fruit': ['orange', 'pineapple'], 'meat': ['eggs', 'spam'], 'vegetable': ['fennel', 'carrot']}
bank_transactions = [200.0, -357.0, -9.99, -15.6, 4320.0, -12000]
splited_bank_transactions = {group_by('credit' if amount > 0 else 'debit'): amount for amount in bank_transactions}
print(splited_bank_transactions)
{'credit': [200.0, 4320.0], 'debit': [-357.0, -9.99, -15.6, -1200.0]}
--
Nicolas Rolin
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/