On a quick skim I see nothing particularly objectionable or controversial in your PEP, except I'm unclear why it needs to be a class method on `dict`.
Since it constructs a basic dict, I thought it belongs best as a dict constructor like dict.fromkeys. It seemed to match other classmethods like datetime.now.
It doesn't strike me as important enough. Surely not every stdlib function that returns a fresh dict needs to be a class method on dict!
Adding something to a builtin like this is rather heavy-handed.
I included an alternate solution of a new class, collections.Grouping, which has some advantages. In addition to having less of that "heavy-handed" feel to it, the class can have a few utility methods that help handle more use cases.
Hm, this actually feels heavier to me. But then again I never liked or understood the need for Counter -- I prefer basic data types and helper functions over custom abstractions. (Also your description doesn't do it justice, you describe a class using a verb phrase, "consume a
sequence and construct a Mapping". The key to Grouping seems to me that it is a dict subclass with a custom constructor. But you don't explain why a subclass is needed, and in that sense I like the other approach better.
But I still think it is much better off as a helper function in itertools.
Is there a really good reason why it can't be a function in `itertools`? (I don't think that it's relevant that it doesn't return an iterator -- it takes in an iterator.)
I considered placing it in the itertools module, but decided against because it doesn't return an iterator. I'm open to that if that's the consensus.
You'll never get consensus on anything here, but you have my blessing for this without consensus.
Also, your pure-Python implementation appears to be O(N log N) if key is None but O(N) otherwise; and the version for key is None uses an extra temporary array of size N. Is that intentional?
Unintentional. I've been drafting pieces of this over the last year and wasn't careful enough with proofreading. I'll fix that momentarily...
Such are the dangers of premature optimization. :-)
Finally, the first example under "Group and Aggregate" is described as a dict of sets but it actually returns a dict of (sorted) lists.
Doctest complained at the set ordering, so I sorted for printing. You're not the only one to make that point, so I'll use sets for the example and ignore doctest.
Thanks for reading!
PS. I just pushed an update to the GitHub repo, as per these comments.
Good luck with your PEP. If it is to go into itertools the biggest hurdle will be convincing Raymond, and I'm not going to overrule him on this: you and he are the educators here so hopefully you two can agree.
On Fri, Jun 29, 2018 at 10:54 AM Michael Selik <firstname.lastname@example.org
As a teacher, I've found that grouping is one of the most awkward tasks for beginners to learn in Python. While this proposal requires understanding a key-function, in my experience that's easier to teach than the nuances of setdefault or defaultdict. Defaultdict requires passing a factory function or class, similar to a key-function. Setdefault is awkwardly named and requires a discussion of references and mutability. Those topics are important and should be covered, but I'd like to let them sink in gradually. Grouping often comes up as a question on the first or second day, especially for folks transitioning from Excel.
I've tested this proposal on actual students (no students were harmed during experimentation) and found that the majority appreciate it. Some are even able to guess what it does (would do) without any priming.
Thanks for your time,
I use list and dict comprehension a lot, and a problem I often have is to do the equivalent of a group_by operation (to use sql terminology).
For example if I have a list of tuples (student, school) and I want to have the list of students by school the only option I'm left with is to write
student_by_school = defaultdict(list)
for student, school in student_school_list:
Thank you for bringing this up. I've been drafting a proposal for a better grouping / group-by operation for a little while. I'm not quite ready to share it, as I'm still researching use cases.
I'm +1 that this task needs improvement, but -1 on this particular solution.