So this ended up a long post, so the TL;DR * There are types of data well suited to the key function approach, and other data not so well suited to it. If you want to support the not as well suited use cases, you should have a value function as well and/or take a (key, value) pair. * There are some nice advantages in flexibility to having a Grouping class, rather than simply a function. So: I propose a best of all worlds version: a Grouping class (subclass of dict): * The constructor takes an iterable of (key, value) pairs by default. * The constructor takes an optional key_func -- when not None, it is used to determine the keys in the iterable instead. * The constructor also takes a value_func -- when specified, it processes the items to determine the values. * a_grouping[key] = value adds the value to the list corresponding to the key. * a_grouping.add(item) -- applies the key_func and value_func to add a new value to the appropriate group. Prototype code here: https://github.com/PythonCHB/grouper Now the lengthy commentary and examples: On Tue, Jul 3, 2018 at 5:21 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Jul 04, 2018 at 10:44:17AM +1200, Greg Ewing wrote:
Steven D'Aprano wrote:
Unless we *make* it a data type. Then not only would it fit well in collections, it would also make it fairly easy to do incremental grouping if you really wanted that.
indeed -- one of motivations for my prototype: https://github.com/PythonCHB/grouper (Did none of my messages get to this list??)
Usual case:
g = groupdict((key(val), val) for val in things)
How does groupdict differ from regular defaultdicts, aside from the slightly different constructor?
* You don't need to declare the defaultdict (and what the default is) first * You don't need to call .append() yourself * It can have a custom .init() and .update() * It can have a .add() method * It can (optionally) use a key function. * And you can have other methods that do useful things with the groupings.
g = groupdict()
for key(val), val in things: g.add(key, val) process_partial_grouping(g)
I don't think that syntax works. I get:
SyntaxError: can't assign to function call
looks like untested code :-) with my prototype it would be: g = groupdict() for key, val in things: g[key] = val process_partial_grouping(g) (this assumes your things are (key, value) pairs) Again, IF you data are a sequence of items, and the value is the item itself, and the key is a simple function of the item, THEN the key function method makes more sense, which for the incremental adding of data would be: g = groupdict(key_fun=a_fun) for thing in things: g.add(thing) process_partial_grouping(g) Even if it did work, it's hardly any simpler than
d = defaultdict(list) for val in things: d[key(val)].append(val)
But then Counter is hardly any simpler than a regular dict too.
exactly -- and counter is actually a little annoyingly too much like a regular dict, in my mind :-) In the latest version of my prototype, the __init__ expects a (key, value) pair by default, but you can also pass in a key_func, and then it will process the iterable passes in as (key_func(item), item) pairs. And the update() method will also use the key_func if one was provided. So a best of both worlds -- pick your API. In this thread, and in the PEP, there various ways of accomplishing this task presented -- none of them (except using a raw itertools.groupby in some cases) is all that onerous. But I do think a custom function or even better, custom class, would create a "one obvious" way to do a common manipulation. A final (repeated) point: Some data are better suited to a (key, value) pair style, and some to a key function style. All of the examples in the PEP are well suited to the key function style. But the example that kicked off this discussion was about data already in (key, value) pairs (actual in that case, (value, key) pairs. And there are other examples. Here's a good one for how one might want to use a Grouping dict more like a regular dict -- of maybe like a simple function constructor: (code in: https://github.com/PythonCHB/grouper/blob/master/examples/ trigrams.py) #!/usr/bin/env python3 """ Demo of processing "trigrams" from Dave Thomas' Coding Kata site: http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/ This is only addressing the part of the problem of building up the trigrams. This is showing various ways of doing it with the Grouping object. """ from grouper import Grouping from operator import itemgetter words = "I wish I may I wish I might".split() # using setdefault with a regular dict: # how I might do it without a Grouping class trigrams = {} for i in range(len(words) - 2): pair = tuple(words[i:i + 2]) follower = words[i + 2] trigrams.setdefault(pair, []).append(follower) print(trigrams) # using a Grouping with a regular loop: trigrams = Grouping() for i in range(len(words) - 2): pair = tuple(words[i:i + 2]) follower = words[i + 2] trigrams[pair] = follower print(trigrams) # using a Grouping with zip trigrams = Grouping() for w1, w2, w3 in zip(words[:], words[1:], words[2:]): trigrams[(w1, w2)] = w3 print(trigrams) # Now we can do it one expression: trigrams = Grouping(((w1, w2), w3) for w1, w2, w3 in zip(words[:], words[1:], words[2:])) print(trigrams) # Now with the key function: # in this case it needs to be in a sequence, so we can't use a simple loop trigrams = Grouping(zip(words[:], words[1:], words[2:]), key_fun=itemgetter(0, 1)) print(trigrams) # Darn! that got the key right, but the value is not right. # we can post process: trigrams = {key: [t[2] for t in value] for key, value in trigrams.items()} print(trigrams) # But THAT is a lot harder to wrap your head around than the original setdefault() loop! # And it mixes key function style and comprehension style -- so no good. # Adding a value_func helps a lot: trigrams = Grouping(zip(words[:], words[1:], words[2:]), key_fun=itemgetter(0, 1), value_fun=itemgetter(2)) print(trigrams) #that works fine, but I, at least, find it klunkier than the comprehensions style # Finally, we can use a regular loop with the functions trigrams = Grouping(key_fun=itemgetter(0, 1), value_fun=itemgetter(2)) for triple in zip(words[:], words[1:], words[2:]): trigrams.add(triple) print(trigrams) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov