[Python-ideas] Allow a group by operation for dict comprehension

Rob Cliffe rob.cliffe at btinternet.com
Thu Jun 28 13:21:02 EDT 2018


Why not write a helper function?  Something like

def group_by(iterable, groupfunc, itemfunc=lambda x:x, sortfunc=lambda 
x:x): # Python 2 & 3 compatible!
     D = {}
     for x in iterable:
         group = groupfunc(x)
         D[group] = D.get(group, []) + [itemfunc(x)]
     if sortfunc is not None:
         for group in D:
             D[group] = sorted(D[group], key=sortfunc)
     return D

Then:

student_list = [ ('james', 'Dublin'), ('jim', 'Cork'), ('mary', 'Cork'), 
('fred', 'Dublin') ]
student_by_school = group_by(student_list, lambda stu_sch : stu_sch[1], 
lambda stu_sch : stu_sch[0])
print (student_by_school)

{'Dublin': ['fred', 'james'], 'Cork': ['jim', 'mary']}

Regards

Rob Cliffe


On 28/06/2018 16:25, Nicolas Rolin wrote:
> Hi,
>
> I use list and dict comprehension a lot, and a problem I often have is 
> to do the equivalent of a group_by operation (to use sql terminology).
>
> For example if I have a list of tuples (student, school) and I want to 
> have the list of students by school the only option I'm left with is 
> to write
>
>     student_by_school = defaultdict(list)
>     for student, school in student_school_list:
>         student_by_school[school].append(student)
>
> What I would expect would be a syntax with comprehension allowing me 
> to write something along the lines of:
>
>     student_by_school = {group_by(school): student for school, student 
> in student_school_list}
>
> or any other syntax that allows me to regroup items from an iterable.
>
>
> Small FAQ:
>
> Q: Why include something in comprehensions when you can do it in a 
> small number of lines ?
>
> A: A really appreciable part of the list and dict comprehension is the 
> fact that it allows the developer to be really explicit about what he 
> wants to do at a given line.
> If you see a comprehension, you know that the developer wanted to have 
> an iterable and not have any side effect other than depleting the 
> iterator (if he respects reasonable code guidelines).
> Initializing an object and doing a for loop to construct it is both 
> too long and not explicit enough about what is intended.
> It should be reserved for intrinsically complex operations, not one of 
> the base operation one can want to do with lists and dicts.
>
>
> Q: Why group by in particular ?
>
> A: If we take SQL queries 
> (https://en.wikipedia.org/wiki/SQL_syntax#Queries) as a reasonable way 
> of seeing how people need to manipulate data on a day-to-day basis, we 
> can see that dict comprehensions already covers most of the base 
> operations, the only missing operations being group by and having.
>
> Q: Why not use it on list with syntax such as
>     student_by_school = [
>         school, student
>         for school, student in student_school_list
>         group by school
>     ]
> ?
>
> A: It would create either a discrepancy with iterators or a perhaps 
> misleading semantic (the one from itertools.groupby, which requires 
> the iterable to be sorted in order to be useful).
> Having the option do do it with a dict remove any ambiguity and should 
> be enough to cover most "group by" applications.
>
>
> Examples:
>
>     edible_list = [('fruit', 'orange'), ('meat', 'eggs'), ('meat', 
> 'spam'), ('fruit', 'apple'), ('vegetable', 'fennel'), ('fruit', 
> 'pineapple'), ('fruit', 'pineapple'), ('vegetable', 'carrot')]
>     edible_list_by_food_type = {group_by(food_type): edible for 
> food_type, edible in edible_list}
>
>     print(edible_list_by_food_type)
>    {'fruit': ['orange', 'pineapple'], 'meat': ['eggs', 'spam'], 
> 'vegetable': ['fennel', 'carrot']}
>
>
>    bank_transactions = [200.0, -357.0, -9.99, -15.6, 4320.0, -12000]
>    splited_bank_transactions = {group_by('credit' if amount > 0 else 
> 'debit'): amount for amount in bank_transactions}
>
>    print(splited_bank_transactions)
>    {'credit': [200.0, 4320.0], 'debit': [-357.0, -9.99, -15.6, -1200.0]}
>
>
>
> -- 
> Nicolas Rolin
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
> 	Virus-free. www.avg.com 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/558a9c26/attachment-0001.html>


More information about the Python-ideas mailing list