[Python-ideas] grouping / dict of lists

Tue Jul 3 08:58:05 EDT 2018

On Tue, Jul 3, 2018 at 2:52 AM Chris Barker via Python-ideas <
python-ideas at python.org> wrote:

> I'd write:
>>
>     map(len, words)
>>
>> But I'd also write
>>     [len(fullname) for fullname in contacts]
>>
>
> map(lambda name: name.first_name, all_names)
> vs
> [name.first_name for nam in all names]
>
> I really like the comprehension form much better when what you really want
> is a simple expression like an attribute access or index or simple
> calculation, or ....
>

Why not `map(attrgetter('first_name'), all_names)`?

> In [56]: grouping(school_student_list)
> Out[56]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'],
> 'SchoolC': ['Nancy']}
>

This one case is definitely nice. However...

And here are the examples from the PEP:
> (untested -- I may hav missed some brackets, etc)
>

What you've missed, in *several* examples is the value part of the tuple in
your API. You've pulled out the key, and forgotten to include anything in
the actual groups.  I have a hunch that if your API were used, this would
be a common pitfall.

I think this argues against your API and for Michael's that simply deals
with "sequences of groupable things."  That's much more like what one deals
with in SQL, and is familiar that way.  If the things grouped are compound
object such as dictionaries, objects with common attributes, named tuples,
etc. then the list of things in a group usually *does not* want the
grouping attribute removed.

> grouping(((len(word), word) for word in words))
> grouping((name[0], name) for name in names))
> grouping((contact.city, contact) for contact in contacts)
>

Good so far, but a lot of redundancy in always spelling tuple of
`(derived-key, object)`.

> grouping((employee['department'] for employee in employees)
> grouping((os.path.splitext(filepath)[1] for filepath in os.listdir('.')))
> grouping(('debit' if v > 0 else 'credit' for v in transactions))
>

And here you forget about the object itself 3 times in a row (or also
forget some derived "value" that you might want in your other comments).

> grouping(((v, k) for v, k in d.items()))
>

This is nice, and spelled correctly.

> So that was an interesting exercise -- many of those are a bit clearer (or
> more compact) with the key function. But I also notice a pattern -- all
> those examples fit very well into the key function pattern:
>

Yep.

I also think that the row-style "list of data" where you want to discard
the key from the values is nicely spelled (in the PEP) as:

INDEX = 0
grouping(sequence, key=lambda row: row.pop(INDEX))

groups = {}
> for item in iterable:
>     groups.setdefault(key(item), []).append(item)
>

I agree this seems better as an implementation.

> I still prefer the class idea over a utility function, because:
> * with a class, you can ad stuff to the grouping later:
>
> a_grouping['key'] = value
>
> or maybe a_grouping.add(item)
> * with a class, you can add utility methods -- I kinda liked that in your
> original PEP.
>

I agree still (after all, I proposed it to Michael).  But this seems minor,
and Guido seems not to like `collections` that much (or at least he
commented on not using Counter ... which I personally love to use and to
teach).

That said, a 'grouping()' function seems fine to me also... with a couple
utility functions (that need not be builtin, or even standard library
necessarily) in place of methods.  A lot of what methods would do can
easily be done using comprehensions as well, some examples are shown in the
PEP.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180703/98173c9b/attachment.html>