[Python-ideas] Allow a group by operation for dict comprehension
Rob Cliffe
rob.cliffe at btinternet.com
Thu Jun 28 13:21:02 EDT 2018
Why not write a helper function? Something like
def group_by(iterable, groupfunc, itemfunc=lambda x:x, sortfunc=lambda
x:x): # Python 2 & 3 compatible!
D = {}
for x in iterable:
group = groupfunc(x)
D[group] = D.get(group, []) + [itemfunc(x)]
if sortfunc is not None:
for group in D:
D[group] = sorted(D[group], key=sortfunc)
return D
Then:
student_list = [ ('james', 'Dublin'), ('jim', 'Cork'), ('mary', 'Cork'),
('fred', 'Dublin') ]
student_by_school = group_by(student_list, lambda stu_sch : stu_sch[1],
lambda stu_sch : stu_sch[0])
print (student_by_school)
{'Dublin': ['fred', 'james'], 'Cork': ['jim', 'mary']}
Regards
Rob Cliffe
On 28/06/2018 16:25, Nicolas Rolin wrote:
> Hi,
>
> I use list and dict comprehension a lot, and a problem I often have is
> to do the equivalent of a group_by operation (to use sql terminology).
>
> For example if I have a list of tuples (student, school) and I want to
> have the list of students by school the only option I'm left with is
> to write
>
> student_by_school = defaultdict(list)
> for student, school in student_school_list:
> student_by_school[school].append(student)
>
> What I would expect would be a syntax with comprehension allowing me
> to write something along the lines of:
>
> student_by_school = {group_by(school): student for school, student
> in student_school_list}
>
> or any other syntax that allows me to regroup items from an iterable.
>
>
> Small FAQ:
>
> Q: Why include something in comprehensions when you can do it in a
> small number of lines ?
>
> A: A really appreciable part of the list and dict comprehension is the
> fact that it allows the developer to be really explicit about what he
> wants to do at a given line.
> If you see a comprehension, you know that the developer wanted to have
> an iterable and not have any side effect other than depleting the
> iterator (if he respects reasonable code guidelines).
> Initializing an object and doing a for loop to construct it is both
> too long and not explicit enough about what is intended.
> It should be reserved for intrinsically complex operations, not one of
> the base operation one can want to do with lists and dicts.
>
>
> Q: Why group by in particular ?
>
> A: If we take SQL queries
> (https://en.wikipedia.org/wiki/SQL_syntax#Queries) as a reasonable way
> of seeing how people need to manipulate data on a day-to-day basis, we
> can see that dict comprehensions already covers most of the base
> operations, the only missing operations being group by and having.
>
> Q: Why not use it on list with syntax such as
> student_by_school = [
> school, student
> for school, student in student_school_list
> group by school
> ]
> ?
>
> A: It would create either a discrepancy with iterators or a perhaps
> misleading semantic (the one from itertools.groupby, which requires
> the iterable to be sorted in order to be useful).
> Having the option do do it with a dict remove any ambiguity and should
> be enough to cover most "group by" applications.
>
>
> Examples:
>
> edible_list = [('fruit', 'orange'), ('meat', 'eggs'), ('meat',
> 'spam'), ('fruit', 'apple'), ('vegetable', 'fennel'), ('fruit',
> 'pineapple'), ('fruit', 'pineapple'), ('vegetable', 'carrot')]
> edible_list_by_food_type = {group_by(food_type): edible for
> food_type, edible in edible_list}
>
> print(edible_list_by_food_type)
> {'fruit': ['orange', 'pineapple'], 'meat': ['eggs', 'spam'],
> 'vegetable': ['fennel', 'carrot']}
>
>
> bank_transactions = [200.0, -357.0, -9.99, -15.6, 4320.0, -12000]
> splited_bank_transactions = {group_by('credit' if amount > 0 else
> 'debit'): amount for amount in bank_transactions}
>
> print(splited_bank_transactions)
> {'credit': [200.0, 4320.0], 'debit': [-357.0, -9.99, -15.6, -1200.0]}
>
>
>
> --
> Nicolas Rolin
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> Virus-free. www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/558a9c26/attachment-0001.html>
More information about the Python-ideas
mailing list