Fwd: grouping / dict of lists
Ivan, Did you mean this to go to the list? I hope so, as I've cc-d it this time :-) On Sun, Jul 1, 2018 at 1:20 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 1 July 2018 at 06:18, Chris Barker via Python-ideas < python-ideas@python.org> wrote:
I'm really warming to the:
Alternate: collections.Grouping
version -- I really like this as a kind of custom mapping, rather than "just a function" (or alternate constructor) --
I wanted the group to be represented as a set, not a list. I however
understand that list may be more common. Can we design an API that would make this configurable? Something like:
from collections import Grouping
deps = Grouping(set) # list can be the default deps.update(other_deps) # uses set.update or list.extend for every key deps.add(trigger, target) # uses set.add or list.append
yeah, I thought about that too -- Michael was using set() in some of his examples. But the question is -- do we have a single switchable version or just too classes?
Probably allowing an arbitrary collection for values is to general/hard.
maybe not -- if we had the criteria that you pass in any collection you wanted, as long as it had either an .append() or .add()method, it would be pretty easy to do with duck typing magic. Sure -- a user could make a mess easily enough by passing in a weird custom class, but so what? Using something other than a set or list would be a "at your own risk" thing anyway.
Maybe we can just add a flag `unique=True` to the constructor, that will cause using sets instead of lists for groups?
That's another, more robust, but less flexible option. Stay tuned for a prototype, if I can get it done fast.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Michael changed from set to list at my urging. A list is more general. A groupby in Pandas or SQL does not enforce uniqueness, but DOES preserve order. I think the PEP is not fully updated, but it's a list everywhere in the proposal itself, just not in the "old techniques." Moreover, Michael gives example of "casting" the Grouping to a dictionary with either sets or Counters as values. Both are useful, and both can be derived from list. But you cannot go backwards from either to the list. The transformation is simple and obvious, and can be included in eventual documentation. It really is better to construct the collection using lists—in the fully general manner—and then only throw away the generality when that appropriate. On Sun, Jul 1, 2018, 8:47 PM Chris Barker via Python-ideas < python-ideas@python.org> wrote:
Ivan,
Did you mean this to go to the list? I hope so, as I've cc-d it this time :-)
On Sun, Jul 1, 2018 at 1:20 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 1 July 2018 at 06:18, Chris Barker via Python-ideas < python-ideas@python.org> wrote:
I'm really warming to the:
Alternate: collections.Grouping
version -- I really like this as a kind of custom mapping, rather than "just a function" (or alternate constructor) --
I wanted the group to be represented as a set, not a list. I however
understand that list may be more common. Can we design an API that would make this configurable? Something like:
from collections import Grouping
deps = Grouping(set) # list can be the default deps.update(other_deps) # uses set.update or list.extend for every key deps.add(trigger, target) # uses set.add or list.append
yeah, I thought about that too -- Michael was using set() in some of his examples.
But the question is -- do we have a single switchable version or just too classes?
Probably allowing an arbitrary collection for values is to general/hard.
maybe not -- if we had the criteria that you pass in any collection you wanted, as long as it had either an .append() or .add()method, it would be pretty easy to do with duck typing magic.
Sure -- a user could make a mess easily enough by passing in a weird custom class, but so what? Using something other than a set or list would be a "at your own risk" thing anyway.
Maybe we can just add a flag `unique=True` to the constructor, that will cause using sets instead of lists for groups?
That's another, more robust, but less flexible option.
Stay tuned for a prototype, if I can get it done fast....
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Oh, it looks like he has modified the PEP and taken out the examples of conversion. That's too bad, hopefully they'll be added back. But it's pretty simple. Whether my idea of collections.Grouping is adapted or whether a function/classmethod grouping() produces a plain dictionary, the casting would be the same: {k:set(v) for k,v in deps.items()} {k:Counter(v) for k,v in deps.items()} On Sun, Jul 1, 2018, 10:12 PM David Mertz <mertz@gnosis.cx> wrote:
Michael changed from set to list at my urging. A list is more general. A groupby in Pandas or SQL does not enforce uniqueness, but DOES preserve order. I think the PEP is not fully updated, but it's a list everywhere in the proposal itself, just not in the "old techniques."
Moreover, Michael gives example of "casting" the Grouping to a dictionary with either sets or Counters as values. Both are useful, and both can be derived from list. But you cannot go backwards from either to the list. The transformation is simple and obvious, and can be included in eventual documentation.
It really is better to construct the collection using lists—in the fully general manner—and then only throw away the generality when that appropriate.
On Sun, Jul 1, 2018, 8:47 PM Chris Barker via Python-ideas < python-ideas@python.org> wrote:
Ivan,
Did you mean this to go to the list? I hope so, as I've cc-d it this time :-)
On Sun, Jul 1, 2018 at 1:20 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 1 July 2018 at 06:18, Chris Barker via Python-ideas < python-ideas@python.org> wrote:
I'm really warming to the:
Alternate: collections.Grouping
version -- I really like this as a kind of custom mapping, rather than "just a function" (or alternate constructor) --
I wanted the group to be represented as a set, not a list. I however
understand that list may be more common. Can we design an API that would make this configurable? Something like:
from collections import Grouping
deps = Grouping(set) # list can be the default deps.update(other_deps) # uses set.update or list.extend for every key deps.add(trigger, target) # uses set.add or list.append
yeah, I thought about that too -- Michael was using set() in some of his examples.
But the question is -- do we have a single switchable version or just too classes?
Probably allowing an arbitrary collection for values is to general/hard.
maybe not -- if we had the criteria that you pass in any collection you wanted, as long as it had either an .append() or .add()method, it would be pretty easy to do with duck typing magic.
Sure -- a user could make a mess easily enough by passing in a weird custom class, but so what? Using something other than a set or list would be a "at your own risk" thing anyway.
Maybe we can just add a flag `unique=True` to the constructor, that will cause using sets instead of lists for groups?
That's another, more robust, but less flexible option.
Stay tuned for a prototype, if I can get it done fast....
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Jul 1, 2018 at 7:28 PM, David Mertz <mertz@gnosis.cx> wrote:
But it's pretty simple. Whether my idea of collections.Grouping is adapted or whether a function/classmethod grouping() produces a plain dictionary,
or my custom class...
the casting would be the same:
{k:set(v) for k,v in deps.items()}
{k:Counter(v) for k,v in deps.items()}
hmm, makes we wonder if it would make sense to update my implementation to allow mapping types as well for the collection --now we'd be getting really magic.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Sun, Jul 1, 2018 at 9:36 PM, Chris Barker <chris.barker@noaa.gov> wrote:
hmm, makes we wonder if it would make sense to update my implementation to allow mapping types as well for the collection
general mapping types don't make sense -- but I added Counter. Which is a pretty special case, so I think it probably makes that case that it should just always be a list, and you can convert to others later. Though maybe list, set and Counter are the ones you'd want to use ???? Again, real world use cases are needed! -CHB code here: https://github.com/PythonCHB/grouper/blob/master/grouper/grouper.py -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Sun, Jul 1, 2018 at 7:12 PM, David Mertz <mertz@gnosis.cx> wrote:
Michael changed from set to list at my urging. A list is more general. A groupby in Pandas or SQL does not enforce uniqueness, but DOES preserve order.
<snip> It really is better to construct the collection using lists—in the fully general manner—and then only throw away the generality when that appropriate. well, yes -- if there were only one option, then list is pretty obvious. but whether converting to sets after the fact is just as good or not -- I don't think so. It's only just as good if you think of it as a one-time operation -- process a bunch of data all at once, and get back a dict with the results. But I'm thinking of it in a different way: Create a custom class derived from dict that you can add stuff to at any time --much more like the current examples in the collections module. If you simply want a groupby function that returns a regular dict, then you need a utility function (or a few), not a new class. If you are making a class that enforces the the values to be a collection of items, then list is the obvious default, but of someone wants a set -- they want it built in to the class, not converted after the fact. I've extended my prototype to do just that: class Grouping(dict): ... def __init__(self, iterable=(), *, collection=list): "collection" is a class that's either a Mutable Sequence (has .append and .extend methods) or Set (has .add and .update methods). Once you create a Grouping instance, the collection class you pass in is used everywhere. I've put the prototype up on gitHub if anyone wants to take a look, try it out, suggest changes, etc: https://github.com/PythonCHB/grouper (and enclosed here) Note that I am NOT proposing this particular implementation or names, or anything. I welcome feedback on the implementation, API and naming scheme, but it would be great if we could all be clear on whether the critique is of the idea or of the implementation. This particular implementation uses pretty hack meta-class magic (or the type constructor anyway) -- if something set-like is passed in, it creates a subclass that adds .append and .extend methods, so that the rest of the code doesn't have to special case. Not sure if that's a good idea, it feels pretty kludgy -- but kinda fun to write. It also needs more test cases and example use cases for sure. And before we go much farther with this discussion, it would be great to see some more real-world use cases, if anyone has some in mind. -CHB ------- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (2)
-
Chris Barker
-
David Mertz