[Python-ideas] Fwd: grouping / dict of lists

Chris Barker chris.barker at noaa.gov
Mon Jul 2 00:31:18 EDT 2018


On Sun, Jul 1, 2018 at 7:12 PM, David Mertz <mertz at gnosis.cx> wrote:

> Michael changed from set to list at my urging. A list is more general. A
> groupby in Pandas or SQL does not enforce uniqueness, but DOES preserve
> order.
>

<snip>

It really is better to construct the collection using lists—in the fully
general manner—and then only throw away the generality when that
appropriate.

well, yes -- if there were only one option, then list is pretty obvious.

but whether converting to sets after the fact is just as good or not -- I
don't think so.

It's only just as good if you think of it as a one-time operation --
process a bunch of data all at once, and get back a dict with the results.
But I'm thinking of it in a different way:

Create a custom class derived from dict that you can add stuff to at any
time --much more like the current examples in the collections module.

If you simply want a groupby function that returns a regular dict, then you
need a utility function (or a few), not a new class.

If you are making a class that enforces the the values to be a collection
of items, then list is the obvious default, but of someone wants a set --
they want it built in to the class, not converted after the fact.

I've extended my prototype to do just that:

class Grouping(dict):
   ...
    def __init__(self, iterable=(), *, collection=list):

"collection" is a class that's either a Mutable Sequence (has .append and
.extend methods) or Set (has .add and .update methods).

Once you create a Grouping instance, the collection class you pass in is
used everywhere.

I've put the prototype up on gitHub if anyone wants to take a look, try it
out, suggest changes, etc:

https://github.com/PythonCHB/grouper

(and enclosed here)

Note that I am NOT proposing this particular implementation or names, or
anything. I welcome feedback on the implementation, API and naming scheme,
but it would be great if we could all be clear on whether the critique is
of the idea or of the implementation.

This particular implementation uses pretty hack meta-class magic (or the
type constructor anyway) -- if something set-like is passed in, it creates
a subclass that adds .append and .extend methods, so that the rest of the
code doesn't have to special case. Not sure if that's a good idea, it feels
pretty kludgy -- but kinda fun to write.

It also needs more test cases and example use cases for sure.

And before we go much farther with this discussion, it would be great to
see some more real-world use cases, if anyone has some in mind.

-CHB

-------
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180701/5be65cd6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: grouper.py
Type: text/x-python-script
Size: 4171 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180701/5be65cd6/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_grouper.py
Type: text/x-python-script
Size: 2914 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180701/5be65cd6/attachment-0003.bin>


More information about the Python-ideas mailing list