I think the current default quite weird, as it pretty much account to a count() of each key (which can be useful, but not really what I except from a grouping). I would prefer a default that might return an error to a default that says ok and output something that is not what I might want.
For example the default could be such that grouping unpack tuples (key, value) from the iterator and do what's expected with it (group value by key). It is quite reasonable, and you have one example with (key, value) in your example, and no example with the current default. It also allows to use syntax of the kind

>grouping((food_type, food_name for food_type, food_name in foods))

which is pretty nice to have.

Nicolas Rolin

2018-07-02 9:43 GMT+02:00 Michael Selik <mike@selik.org>:
I made some heavy revisions to the PEP. Linking again for convenience.

Replying to Guido, Nick, David, Chris, and Ivan in 4 sections below.

On Fri, Jun 29, 2018 at 11:25 PM Guido van Rossum <guido@python.org> wrote:
On Fri, Jun 29, 2018 at 3:23 PM Michael Selik <mike@selik.org> wrote:
On Fri, Jun 29, 2018 at 2:43 PM Guido van Rossum <guido@python.org> wrote:
On a quick skim I see nothing particularly objectionable or controversial in your PEP, except I'm unclear why it needs to be a class method on `dict`.

Since it constructs a basic dict, I thought it belongs best as a dict constructor like dict.fromkeys. It seemed to match other classmethods like datetime.now.

It doesn't strike me as important enough. Surely not every stdlib function that returns a fresh dict needs to be a class method on dict!

Thinking back, I may have chosen the name "groupby" first, following `itertools.groupby`, SQL, and other languages, and I wanted to make a clear distinction from `itertools.groupby`. Putting it on the `dict` namespace clarified that it's returning a dict.

However, naming it `grouping` allows it to be a stand-alone function.

But I still think it is much better off as a helper function in itertools.
I considered placing it in the itertools module, but decided against because it doesn't return an iterator. I'm open to that if that's the consensus.

You'll never get consensus on anything here, but you have my blessing for this without consensus.

That feels like a success, but I'm going to be a bit more ambitious and try to persuade you that `grouping` belongs in the built-ins. I revised my draft to streamline the examples and make a clearer comparison with existing tools.

On Sat, Jun 30, 2018 at 2:01 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not sure if the draft was updated since [Guido] looked at it, but it

does mention that one benefit of the collections.Grouping approach is
being able to add native support for mapping a callable across every
individual item in the collection (ignoring the group structure), as
well as for applying aggregate functions to reduce the groups to
single values in a standard dict.

Delegating those operations to the container API that way then means
that other libraries can expose classes that implement the grouping
API, but with a completely different backend storage model.

While it'd be nice to create a standard interface as you point out, my primary goal is to create an "obvious" way for both beginners and experts to group, classify, categorize, bucket, demultiplex, taxonomize, etc. I started revising the PEP last night and found myself getting carried away with adding methods to the Grouping class that were more distracting than useful. Since the most important thing is to make this as accessible and easy as possible, I re-focused the proposal on the core idea of grouping.

[Ivan, Chris, David]
On Sun, Jul 1, 2018 at 7:29 PM David Mertz <mertz@gnosis.cx> wrote:
{k:set(v) for k,v in deps.items()}
{k:Counter(v) for k,v in deps.items()}

I had dropped those specific examples in favor of generically "func(g)", but added them back. Your discussion with Ivan and Chris showed that it was useful to be specific.

On Sat, Jun 30, 2018 at 10:18 PM Chris Barker <chris.barker@noaa.gov> wrote:
I'm really warming to the:
Alternate: collections.Grouping
version -- I really like this as a kind of custom mapping, rather than "just a function" (or alternate constructor) -- and I like your point that it can have a bit of functionality built in other than on construction.

I moved ``collections.Grouping`` to the "Rejected Alternatives" section, but that's more like a "personal 2nd choices" instead of "rejected".

__init__ and update would take an iterable of (key, value) pairs, rather than a single sequence.

I added a better demonstration in the PEP for handling that kind of input. You have one of two strategies with my proposed function.

Either create a reverse lookup dict:
    d = {v: k for k, v in items}
    grouping(d, key=lambda k: d[k])

Or discard the keys after grouping:
    groups = grouping(items, key=lambda t: t[0])
    groups = {k: [v for _, v in g] for k, g in groups.items()}

While thinking of examples for this PEP, it's tempting to use overly-simplified data. In practice, instead of (key, value) pairs, it's usually either individual values or n-tuple rows. In the latter case, sometimes the key should be dropped from the row when grouping, sometimes kept in the row, and sometimes the key must be computed from multiple values within the row.

[...] building up a data structure with word pairs, and a list of all the words that follow the pair in a piece of text. [...example code...]

I provided a similar example in my first draft, showing the creation of a Markov chain data structure. A few folks gave the feedback that it was more distracting from the PEP than useful. It's still there in the "stateful key-function" example, but it's now just a few lines.

[...] if you are teaching, say data analysis with Python -- it might be nice to have this builtin, but if you are teaching "programming with Python" I'd probably encourage them to do it by hand first anyway :-)

I agree, but users in both cases will appreciate the proposed built-in.

On Sun, Jul 1, 2018 at 10:35 PM Chris Barker <chris.barker@noaa.gov> wrote:
Though maybe list, set and Counter are the [aggregation collections] you'd want to use?

I've been searching the standard library and popular community libraries for use of setdefault, defaultdict, groupby, and the word "group" or "groups" periodically over the past year or so. I admit I haven't been as systematic as maybe I should have been, but I feel like I've been pretty thorough.

The majority of grouping uses a list. A significant portion use a set. A handful use a Counter. And that's basically it. Sometimes there's a specialized container class, but they are generally composed of a list, set, or Counter. There may have been other types, but if it was interesting, I think I'd have written down an example of it in my notes.

Most other languages with a similar tool have decided to return a mapping of lists or the equivalent for that language. If we make that choice, we're in good company.

before making any decisions about the best API, it would probably be a good idea to collect examples of the kind of data that people really do need to group like this. Does it come in (key, value) pairs naturally? or in one big sequence with a key function that's easy to write? who knows without examples of real world use cases.

It may not come across in the PEP how much research I've put into this. I'll some time to compile the evidence, but I'm confident that it's more common to need a key-function than to have (key, value) pairs. I'll get back to you soon(ish) with data.

-- Michael

PS. Not to bikeshed, but a Grouper is a kind of fish. :-)

Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/


Nicolas Rolin
| Data Scientist
+ 33 631992617 - nicolas.rolin@tiime.fr

15 rue Auber, 75009 Paris