[Python-ideas] grouping / dict of lists

Guido van Rossum guido at python.org
Sat Jun 30 02:25:03 EDT 2018


On Fri, Jun 29, 2018 at 3:23 PM Michael Selik <mike at selik.org> wrote:

> On Fri, Jun 29, 2018 at 2:43 PM Guido van Rossum <guido at python.org> wrote:
>
>> On a quick skim I see nothing particularly objectionable or controversial
>> in your PEP, except I'm unclear why it needs to be a class method on `dict`.
>>
>
> Since it constructs a basic dict, I thought it belongs best as a dict
> constructor like dict.fromkeys. It seemed to match other classmethods like
> datetime.now.
>

It doesn't strike me as important enough. Surely not every stdlib function
that returns a fresh dict needs to be a class method on dict!


> Adding something to a builtin like this is rather heavy-handed.
>>
>
> I included an alternate solution of a new class, collections.Grouping,
> which has some advantages. In addition to having less of that
> "heavy-handed" feel to it, the class can have a few utility methods that
> help handle more use cases.
>

Hm, this actually feels heavier to me. But then again I never liked or
understood the need for Counter -- I prefer basic data types and helper
functions over custom abstractions. (Also your description doesn't do it
justice, you describe a class using a verb phrase, "consume a sequence and
construct a Mapping". The key to Grouping seems to me that it is a dict
subclass with a custom constructor. But you don't explain why a subclass is
needed, and in that sense I like the other approach better.

But I still think it is much better off as a helper function in itertools.


> Is there a really good reason why it can't be a function in `itertools`?
>> (I don't think that it's relevant that it doesn't return an iterator -- it
>> takes in an iterator.)
>>
>
> I considered placing it in the itertools module, but decided against
> because it doesn't return an iterator. I'm open to that if that's the
> consensus.
>

You'll never get consensus on anything here, but you have my blessing for
this without consensus.


> Also, your pure-Python implementation appears to be O(N log N) if key is
>> None but O(N) otherwise; and the version for key is None uses an extra
>> temporary array of size N. Is that intentional?
>>
>
> Unintentional. I've been drafting pieces of this over the last year and
> wasn't careful enough with proofreading. I'll fix that momentarily...
>

Such are the dangers of premature optimization. :-)


> Finally, the first example under "Group and Aggregate" is described as a
>> dict of sets but it actually returns a dict of (sorted) lists.
>>
>
> Doctest complained at the set ordering, so I sorted for printing. You're
> not the only one to make that point, so I'll use sets for the example and
> ignore doctest.
>
> Thanks for reading!
> -- Michael
>
> PS. I just pushed an update to the GitHub repo, as per these comments.
>

Good luck with your PEP. If it is to go into itertools the biggest hurdle
will be convincing Raymond, and I'm not going to overrule him on this: you
and he are the educators here so hopefully you two can agree.

--Guido


>
>
>> On Fri, Jun 29, 2018 at 10:54 AM Michael Selik <mike at selik.org> wrote:
>>
>>> Hello,
>>>
>>> I've drafted a PEP for an easier way to construct groups of elements
>>> from a sequence. https://github.com/selik/peps/blob/master/pep-9999.rst
>>>
>>> As a teacher, I've found that grouping is one of the most awkward tasks
>>> for beginners to learn in Python. While this proposal requires
>>> understanding a key-function, in my experience that's easier to teach than
>>> the nuances of setdefault or defaultdict. Defaultdict requires passing a
>>> factory function or class, similar to a key-function. Setdefault is
>>> awkwardly named and requires a discussion of references and mutability.
>>> Those topics are important and should be covered, but I'd like to let them
>>> sink in gradually. Grouping often comes up as a question on the first or
>>> second day, especially for folks transitioning from Excel.
>>>
>>> I've tested this proposal on actual students (no students were harmed
>>> during experimentation) and found that the majority appreciate it. Some are
>>> even able to guess what it does (would do) without any priming.
>>>
>>> Thanks for your time,
>>> -- Michael
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 28, 2018 at 8:38 AM Michael Selik <mike at selik.org> wrote:
>>>
>>>> On Thu, Jun 28, 2018 at 8:25 AM Nicolas Rolin <nicolas.rolin at tiime.fr>
>>>> wrote:
>>>>
>>>>> I use list and dict comprehension a lot, and a problem I often have is
>>>>> to do the equivalent of a group_by operation (to use sql terminology).
>>>>>
>>>>> For example if I have a list of tuples (student, school) and I want to
>>>>> have the list of students by school the only option I'm left with is to
>>>>> write
>>>>>
>>>>>     student_by_school = defaultdict(list)
>>>>>     for student, school in student_school_list:
>>>>>         student_by_school[school].append(student)
>>>>>
>>>>
>>>> Thank you for bringing this up. I've been drafting a proposal for a
>>>> better grouping / group-by operation for a little while. I'm not quite
>>>> ready to share it, as I'm still researching use cases.
>>>>
>>>> I'm +1 that this task needs improvement, but -1 on this particular
>>>> solution.
>>>>
>>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180629/4f7aa8d6/attachment.html>


More information about the Python-ideas mailing list