[Python-ideas] grouping / dict of lists

Mon Jul 2 05:31:47 EDT 2018

I think the current default quite weird, as it pretty much account to a
count() of each key (which can be useful, but not really what I except from
a grouping). I would prefer a default that might return an error to a
default that says ok and output something that is not what I might want.
For example the default could be such that grouping unpack tuples (key,
value) from the iterator and do what's expected with it (group value by
key). It is quite reasonable, and you have one example with (key, value) in
your example, and no example with the current default. It also allows to
use syntax of the kind

>grouping((food_type, food_name for food_type, food_name in foods))

which is pretty nice to have.

-- 
Nicolas Rolin

2018-07-02 9:43 GMT+02:00 Michael Selik <mike at selik.org>:

> I made some heavy revisions to the PEP. Linking again for convenience.
> https://github.com/selik/peps/blob/master/pep-9999.rst
>
> Replying to Guido, Nick, David, Chris, and Ivan in 4 sections below.
>
>
> [Guido]
> On Fri, Jun 29, 2018 at 11:25 PM Guido van Rossum <guido at python.org>
> wrote:
>
>> On Fri, Jun 29, 2018 at 3:23 PM Michael Selik <mike at selik.org> wrote:
>>
>>> On Fri, Jun 29, 2018 at 2:43 PM Guido van Rossum <guido at python.org>
>>> wrote:
>>>
>>>> On a quick skim I see nothing particularly objectionable or
>>>> controversial in your PEP, except I'm unclear why it needs to be a class
>>>> method on `dict`.
>>>>
>>>
>>> Since it constructs a basic dict, I thought it belongs best as a dict
>>> constructor like dict.fromkeys. It seemed to match other classmethods like
>>> datetime.now.
>>>
>>
>> It doesn't strike me as important enough. Surely not every stdlib
>> function that returns a fresh dict needs to be a class method on dict!
>>
>
> Thinking back, I may have chosen the name "groupby" first, following
> `itertools.groupby`, SQL, and other languages, and I wanted to make a clear
> distinction from `itertools.groupby`. Putting it on the `dict` namespace
> clarified that it's returning a dict.
>
> However, naming it `grouping` allows it to be a stand-alone function.
>
> But I still think it is much better off as a helper function in itertools.
>>
> I considered placing it in the itertools module, but decided against
>>> because it doesn't return an iterator. I'm open to that if that's the
>>> consensus.
>>>
>>
>> You'll never get consensus on anything here, but you have my blessing for
>> this without consensus.
>>
>
> That feels like a success, but I'm going to be a bit more ambitious and
> try to persuade you that `grouping` belongs in the built-ins. I revised my
> draft to streamline the examples and make a clearer comparison with
> existing tools.
>
>
> [Nick]
> On Sat, Jun 30, 2018 at 2:01 AM Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> I'm not sure if the draft was updated since [Guido] looked at it, but it
>
>
>> does mention that one benefit of the collections.Grouping approach is
>> being able to add native support for mapping a callable across every
>> individual item in the collection (ignoring the group structure), as
>> well as for applying aggregate functions to reduce the groups to
>> single values in a standard dict.
>>
>> Delegating those operations to the container API that way then means
>> that other libraries can expose classes that implement the grouping
>> API, but with a completely different backend storage model.
>>
>
> While it'd be nice to create a standard interface as you point out, my
> primary goal is to create an "obvious" way for both beginners and experts
> to group, classify, categorize, bucket, demultiplex, taxonomize, etc. I
> started revising the PEP last night and found myself getting carried away
> with adding methods to the Grouping class that were more distracting than
> useful. Since the most important thing is to make this as accessible and
> easy as possible, I re-focused the proposal on the core idea of grouping.
>
>
> [Ivan, Chris, David]
> On Sun, Jul 1, 2018 at 7:29 PM David Mertz <mertz at gnosis.cx> wrote:
>
>> {k:set(v) for k,v in deps.items()}
>> {k:Counter(v) for k,v in deps.items()}
>>
>
> I had dropped those specific examples in favor of generically "func(g)",
> but added them back. Your discussion with Ivan and Chris showed that it was
> useful to be specific.
>
>
> [Chris]
> On Sat, Jun 30, 2018 at 10:18 PM Chris Barker <chris.barker at noaa.gov>
> wrote:
>
>> I'm really warming to the:
>> Alternate: collections.Grouping
>> version -- I really like this as a kind of custom mapping, rather than
>> "just a function" (or alternate constructor) -- and I like your point that
>> it can have a bit of functionality built in other than on construction.
>>
>
> I moved ``collections.Grouping`` to the "Rejected Alternatives" section,
> but that's more like a "personal 2nd choices" instead of "rejected".
>
> [...]
>> __init__ and update would take an iterable of (key, value) pairs, rather
>> than a single sequence.
>>
>
> I added a better demonstration in the PEP for handling that kind of input.
> You have one of two strategies with my proposed function.
>
> Either create a reverse lookup dict:
>     d = {v: k for k, v in items}
>     grouping(d, key=lambda k: d[k])
>
> Or discard the keys after grouping:
>     groups = grouping(items, key=lambda t: t[0])
>     groups = {k: [v for _, v in g] for k, g in groups.items()}
>
> While thinking of examples for this PEP, it's tempting to use
> overly-simplified data. In practice, instead of (key, value) pairs, it's
> usually either individual values or n-tuple rows. In the latter case,
> sometimes the key should be dropped from the row when grouping, sometimes
> kept in the row, and sometimes the key must be computed from multiple
> values within the row.
>
>
> [...] building up a data structure with word pairs, and a list of all the
>> words that follow the pair in a piece of text. [...example code...]
>>
>
> I provided a similar example in my first draft, showing the creation of a
> Markov chain data structure. A few folks gave the feedback that it was more
> distracting from the PEP than useful. It's still there in the "stateful
> key-function" example, but it's now just a few lines.
>
> [...] if you are teaching, say data analysis with Python -- it might be
>> nice to have this builtin, but if you are teaching "programming with
>> Python" I'd probably encourage them to do it by hand first anyway :-)
>>
>
> I agree, but users in both cases will appreciate the proposed built-in.
>
>
> On Sun, Jul 1, 2018 at 10:35 PM Chris Barker <chris.barker at noaa.gov>
> wrote:
>
>> Though maybe list, set and Counter are the [aggregation collections]
>> you'd want to use?
>>
>
> I've been searching the standard library and popular community libraries
> for use of setdefault, defaultdict, groupby, and the word "group" or
> "groups" periodically over the past year or so. I admit I haven't been as
> systematic as maybe I should have been, but I feel like I've been pretty
> thorough.
>
> The majority of grouping uses a list. A significant portion use a set. A
> handful use a Counter. And that's basically it. Sometimes there's a
> specialized container class, but they are generally composed of a list,
> set, or Counter. There may have been other types, but if it was
> interesting, I think I'd have written down an example of it in my notes.
>
> Most other languages with a similar tool have decided to return a mapping
> of lists or the equivalent for that language. If we make that choice, we're
> in good company.
>
>
> [...]
>> before making any decisions about the best API, it would probably be a
>> good idea to collect examples of the kind of data that people really do
>> need to group like this. Does it come in (key, value) pairs naturally? or
>> in one big sequence with a key function that's easy to write? who knows
>> without examples of real world use cases.
>>
>
> It may not come across in the PEP how much research I've put into this.
> I'll some time to compile the evidence, but I'm confident that it's more
> common to need a key-function than to have (key, value) pairs. I'll get
> back to you soon(ish) with data.
>
>
> -- Michael
>
> PS. Not to bikeshed, but a Grouper is a kind of fish. :-)
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>

-- 

--
*Nicolas Rolin* | Data Scientist
+ 33 631992617 - nicolas.rolin at tiime.fr <prenom.nom at tiime.fr>

*15 rue Auber, **75009 Paris*
*www.tiime.fr <http://www.tiime.fr>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180702/2123393e/attachment-0001.html>