Fwd: Allow a group by operation for dict comprehension
Mike Selik asked for my opinion on a draft PEP along these lines. I
proposed a slight modification to his idea that is now reflected in his
latest edits. With some details fleshed out, I think this is a promising
idea. I like the a collections class better, of course, but a dict
classmethod is still a lot smaller change than new syntax change in
comprehension.
On Thu, Jun 28, 2018, 8:15 PM David Mertz
I see the utility, but I would prefer a slightly different approach than you suggest; I think my suggestion will have a lower barrier to acceptance as well.
Rather than add a new classmethod dict.grouper(), I'd like to have a new dict subclass collections.Grouper. The name subject to bikeshedding, of course. I think of this class as a "big sister" of collections.Counter, in a way.
There is behavior that I believe would be useful beyond constructing a new base dictionary. However, I think that construction from an iterable would be a common use pattern. Oh, I'd also recommend following toolz.groupby() in keeping a list rather than a set. It's easy enough to convert a list to a set if wanted, but order and repetitions are preserved in SQL or Pandas 'groupby' operations, and that seems more general.
For example (this typed without testing, forgive any typos or thinkos):
from collections import Grouper # i.e. in Python 3.8+ grouped = Grouper(range(7), key=mod_2) grouped Grouper({0: [0, 2, 4, 6], 1: [1, 3, 5]}) grouped.update([2, 10, 12, 13], key=mod_2) grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12], 1: [1, 3, 5, 13]}) # Updating with no key function groups by identity # ... is there a better idea for the default key function? grouped.update([0, 1, 2]) grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0], 1: [1, 3, 5, 13, 1], 2: [2]}) # Maybe do a different style of update if passed a dict subclass # - Does a key function make sense here? grouped.update({0: 88, 1: 77}) grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0, 88], 1: [1, 3, 5, 13, 1, 77], 2: [2]}) # Avoiding duplicates might sometimes be useful grouped.make_unique() # better name? .no_dup()? grouped Grouper({0: [0, 2, 4, 6, 10, 12, 88], 1: [1, 3, 5, 13, 77], 2: [2]})
I think that most of the methods of Counter make sense to include here in appropriately adjusted versions. Converting to a plain dictionary should probably just be `dict(grouped)`, but it's possible we'd want `grouped.as_dict()` or something.
One thing that *might* be useful is a way to keep using the same key function across updates. Even with no explicit provision, we *could* spell it like this:
grouped.key_func = mod_2 grouped.update([55, 44, 22, 111], key=grouped.key_func)
Perhaps some more official API for doing that would be useful though.
On Thu, Jun 28, 2018 at 7:35 PM David Mertz
wrote: Thanks... Looking now. I'll comment soon.
On Thu, Jun 28, 2018 at 7:05 PM Michael Selik
wrote: Hi David,
We talked about this in Seattle about a year ago at a conference. Would you do me a favor and critique this PEP I've drafted? I'd like to get private feedback before sharing with the group.
https://github.com/selik/peps/blob/master/pep-9999.rst
Thank you, -- Michael
On Thu, Jun 28, 2018 at 1:35 PM David Mertz
wrote: I agree with these recommendations. There are excellent 3rd party tools that do what you want. This is way too much to try to shoehorn into a comprehension.
I'd add one more option. You want something that behaves like SQL. Right in the standard library is sqlite3, and you can create an in-memory DB to hope the data you expect to group.
On 2018-06-29 05:14, David Mertz wrote:
Mike Selik asked for my opinion on a draft PEP along these lines. I proposed a slight modification to his idea that is now reflected in his latest edits. With some details fleshed out, I think this is a promising idea. I like the a collections class better, of course, but a dict classmethod is still a lot smaller change than new syntax change in comprehension.
On Thu, Jun 28, 2018, 8:15 PM David Mertz
mailto:mertz@gnosis.cx> wrote:
[snip]
For example (this typed without testing, forgive any typos or thinkos):
>>> from collections import Grouper # i.e. in Python 3.8+ >>> grouped = Grouper(range(7), key=mod_2) >>> grouped Grouper({0: [0, 2, 4, 6], 1: [1, 3, 5]}) >>> grouped.update([2, 10, 12, 13], key=mod_2) >>> grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12], 1: [1, 3, 5, 13]}) >>> # Updating with no key function groups by identity >>> # ... is there a better idea for the default key function? >>> grouped.update([0, 1, 2]) >>> grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0], 1: [1, 3, 5, 13, 1], 2: [2]})
I think that if a Grouper instance is created with a key function, then that key function should be used by the .update method. You _could_ possibly override that key function by providing a new one when updating, but, OTOH, why would you want to? You'd be mixing different kinds of groupings! So -1 on that.
>>> # Maybe do a different style of update if passed a dict subclass >>> # - Does a key function make sense here? >>> grouped.update({0: 88, 1: 77}) >>> grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0, 88], 1: [1, 3, 5, 13, 1, 77], 2: [2]}) >>> # Avoiding duplicates might sometimes be useful >>> grouped.make_unique() # better name? .no_dup()? >>> grouped Grouper({0: [0, 2, 4, 6, 10, 12, 88], 1: [1, 3, 5, 13, 77], 2: [2]})
If you want to avoid duplicates, maybe the grouper should be created with 'set' as the default factory (see 'defaultdict'). However, there's the problem that 'list' has .append but 'set' has .add...
I think that most of the methods of Counter make sense to include here in appropriately adjusted versions. Converting to a plain dictionary should probably just be `dict(grouped)`, but it's possible we'd want `grouped.as_dict()` or something.
One thing that *might* be useful is a way to keep using the same key function across updates. Even with no explicit provision, we *could* spell it like this:
>>> grouped.key_func = mod_2 >>> grouped.update([55, 44, 22, 111], key=grouped.key_func)
Perhaps some more official API for doing that would be useful though.
[snip]
I created a separate thread to continue this discussion: "grouping / dict
of lists"
https://github.com/selik/peps/blob/master/pep-9999.rst
In my proposal, the update offers a key-function in case the new elements
don't follow the same pattern as the existing ones. I can understand the
view that the class should retain the key-function from initialization.
The issue of the group type -- list, set, Counter, etc. -- is handled by
offering a Grouping.aggregate method. The Grouping class creates lists,
which are passed to the aggregate function. I included examples of
constructing sets and Counters.
On Fri, Jun 29, 2018 at 10:04 AM MRAB
On 2018-06-29 05:14, David Mertz wrote:
Mike Selik asked for my opinion on a draft PEP along these lines. I proposed a slight modification to his idea that is now reflected in his latest edits. With some details fleshed out, I think this is a promising idea. I like the a collections class better, of course, but a dict classmethod is still a lot smaller change than new syntax change in comprehension.
On Thu, Jun 28, 2018, 8:15 PM David Mertz
mailto:mertz@gnosis.cx> wrote: [snip]
For example (this typed without testing, forgive any typos or
thinkos):
>>> from collections import Grouper # i.e. in Python 3.8+ >>> grouped = Grouper(range(7), key=mod_2) >>> grouped Grouper({0: [0, 2, 4, 6], 1: [1, 3, 5]}) >>> grouped.update([2, 10, 12, 13], key=mod_2) >>> grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12], 1: [1, 3, 5, 13]}) >>> # Updating with no key function groups by identity >>> # ... is there a better idea for the default key function? >>> grouped.update([0, 1, 2]) >>> grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0], 1: [1, 3, 5, 13, 1], 2:
[2]})
I think that if a Grouper instance is created with a key function, then that key function should be used by the .update method.
You _could_ possibly override that key function by providing a new one when updating, but, OTOH, why would you want to? You'd be mixing different kinds of groupings! So -1 on that.
>>> # Maybe do a different style of update if passed a dict subclass >>> # - Does a key function make sense here? >>> grouped.update({0: 88, 1: 77}) >>> grouped Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0, 88], 1: [1, 3, 5, 13, 1, 77], 2: [2]}) >>> # Avoiding duplicates might sometimes be useful >>> grouped.make_unique() # better name? .no_dup()? >>> grouped Grouper({0: [0, 2, 4, 6, 10, 12, 88], 1: [1, 3, 5, 13, 77], 2: [2]})
If you want to avoid duplicates, maybe the grouper should be created with 'set' as the default factory (see 'defaultdict'). However, there's the problem that 'list' has .append but 'set' has .add...
I think that most of the methods of Counter make sense to include here in appropriately adjusted versions. Converting to a plain dictionary should probably just be `dict(grouped)`, but it's possible we'd want `grouped.as_dict()` or something.
One thing that *might* be useful is a way to keep using the same key function across updates. Even with no explicit provision, we *could* spell it like this:
>>> grouped.key_func = mod_2 >>> grouped.update([55, 44, 22, 111], key=grouped.key_func)
Perhaps some more official API for doing that would be useful though.
[snip] _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (3)
-
David Mertz
-
Michael Selik
-
MRAB