<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jul 6, 2018 at 5:13 PM, Michael Selik <span dir="ltr"><<a href="mailto:mike@selik.org" target="_blank">mike@selik.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span><div dir="ltr">On Tue, Jul 3, 2018 at 10:11 PM Chris Barker via Python-ideas <<a href="mailto:python-ideas@python.org" target="_blank">python-ideas@python.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div>* There are types of data well suited to the key function approach, and other data not so well suited to it. If you want to support the not as well suited use cases, you should have a value function as well and/or take a (key, value) pair.<br></div><br></div>* There are some nice advantages in flexibility to having a Grouping class, rather than simply a function.</div></blockquote><div><br></div></span><div>The tri-grams example is interesting and shows some clever things you can do. The bi-grams example I wrote in my draft PEP could be extended to handle tri-grams with just a key-function, no value-function. </div></div></div></blockquote><div><br></div><div>hmm, I'll take a look -- 'cause I found that I was really limited to only a certain class of problems without a way to get "custom" values. </div><div><br></div><div>Do you mean the "foods" example?</div><div><br><font face="monospace, monospace">>>> foods = [<br>... ('fruit', 'apple'),<br>... ('vegetable', 'broccoli'),<br>... ('fruit', 'clementine'),<br>... ('vegetable', 'daikon')<br>... ]<br>>>> groups = grouping(foods, key=lambda pair: pair[0])<br>>>> {k: [v for _, v in g] for k, g in groups.items()}<br>{'fruit': ['apple', 'clementine'], 'vegetable': ['broccoli', 'daikon']}</font><br><br><br>Because that one, I think, makes my point well. To get what you want, you have to post-processthe Grouping with a (somewhat complex) comprehension. If someone is that adept with comprehensions, and want to do it that way, the grouping function isn't really buying them much at all, over setdefault, or defaultdict, or roll your own.<br></div><div><br></div>Contrast this with:<br><br><font face="monospace, monospace">groups = grouping(foods,<br> key=lambda pair: pair[0],<br> value=lambda pair: pair[1])</font><br><br>and you're done.<br><div><br></div><div>or:</div><div><br></div><div><font face="monospace, monospace" style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">groups = grouping(foods,<br> key=itemgetter(0),<br> value=itemgetter0))</font><br style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br></div><div> </div><div>Or even better:</div><div><br></div><div><font face="monospace, monospace">groups = grouping(foods)</font></div><div><br></div><div>:-)</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>However, because this example is fun it may be distracting from the core value of ``grouped`` or ``grouping``.</div></div></div></blockquote><div><br></div><div>Actually, I think it's the opposite -- it opens up the concept to be more general purpose -- I guess I'm thinking of this a "dict with lists as the values" that has many purposes beyond strictly "groupby". Maybe that's because I'm a general python programmer, and not a database guy, but if something is going to be added to the stdlib, why not add a more general purpose class?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>I don't think we need a nicer API for complex grouping tasks. As the tasks get increasingly sophisticated, any general-purpose API will be less nice than something built for that specific task.</div></div></div></blockquote><div><br></div><div>I guess this is where we disagree -- I think we've found an API that is general purpose, and cleanly supports multiple tasks.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Instead, I want the easiest possible interface for making groups for every-day use cases. The wide range of situations that ``sorted`` covers with just a key-function suggests that ``grouped`` should follow the same pattern.</div></div></div></blockquote><div><br></div><div>not at all -- sorted() is about, well, sorting -- which means rearranging items. I certainly don't expect it to break up the items for me.</div><div><br></div><div>Again, this is a matter of perspective -- if you you start with "groupby" as a concept, then I can see how you see the parallel with sorted -- you are rearranging the items, but this time into groups.</div><div><br></div><div>But if you start with "a dict of lists", then you take a wider perspective:</div><div><br></div><div>- It can naturally an easily be used to group things</div><div>- It can do another nifty things</div><div>- And as a "dict of something", it's natural to think of keys AND values, and to want a dict-like API -- i.e. pass in (key, value) pairs.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>I do think that the default, key=None, could be set to handle (key, value) pairs.</div></div></div></blockquote><div><br></div><div>OK, so for my part, if you provide the (key, value) pair API, then you don't really need a value_func. But as the "pass in a function to process the data" model IS well suited to some tasks, and some people simply like the style, why not?</div><div> <br></div><div>And it creates an asymetry: or you have a (key, the_item) problem, you can use either the key function API or the (key, value) API -- but if you have a (key, value) problem, you can only use the (key, value) API</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> But I'm still reluctant to break the standard of sorted, min, max, and groupby.</div></div></div></blockquote><div><br></div><div>This is the power of Python's keyword parameters -- anyone coming to this from a perspective of "I expect this to be like <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">sorted, min, max, and groupby" can simply ignore the <font face="monospace, monospace">value</font> parameter :-)</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">One more argument :-)</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">There have been comments a bout how maybe some of the classes in collections are maybe not needed -- Counter, in particular. I tend to agree, but i think the reason Counter is not-that-useful is because it doesn't do enough -- not that it isn't useful -- it's just such a thin wrapper around a dict, that I hardly see the point.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Example:</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><div><font face="monospace, monospace">In [12]: c = Counter()</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">In [13]: c['f'] += 1</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">In [14]: c['g'] = "some random thing"</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">In [15]: c</font></div><div><font face="monospace, monospace">Out[15]: Counter({'f': 1, 'g': 'some random thing'})</font></div></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Is that really that useful? I need to do the counting by hand, and can easily use the regular dict interface to make a mess of it.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">it has a handy constructor, but that's about it.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Anyway, I think we've got this nailed down to a handful of options / decisions</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">1) a dict subclass vs a function that constructs a dict-of-lists</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> - I think a dict subclass offers some real value -- but it comes down a bit to goals: Do we want a general purpose special dict? or a function to perform the "usual" groupby operation?</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">2) Do we have a value function keyword parameter?</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> </span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> - I think this adds real value without taking anything away from the convenience of the simpler key only API</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">3) Do we accept an iterable of (key, value) pairs if no key function is provided?</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> </span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> - I think yes, also because why not? a default of the identity function for key and value is pretty useless.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div>So it comes down to what the community thinks.</div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">-CHB</span></div><div><br></div><div><br></div></div>-- <br><div class="gmail-m_-8240531984230145791gmail_signature"><br>Christopher Barker, Ph.D.<br>Oceanographer<br><br>Emergency Response Division<br>NOAA/NOS/OR&R (206) 526-6959 voice<br>7600 Sand Point Way NE (206) 526-6329 fax<br>Seattle, WA 98115 (206) 526-6317 main reception<br><br><a href="mailto:Chris.Barker@noaa.gov" target="_blank">Chris.Barker@noaa.gov</a></div>
</div></div>