[Python-ideas] Fwd: Allow a group by operation for dict comprehension
David Mertz
mertz at gnosis.cx
Fri Jun 29 00:14:10 EDT 2018
Mike Selik asked for my opinion on a draft PEP along these lines. I
proposed a slight modification to his idea that is now reflected in his
latest edits. With some details fleshed out, I think this is a promising
idea. I like the a collections class better, of course, but a dict
classmethod is still a lot smaller change than new syntax change in
comprehension.
On Thu, Jun 28, 2018, 8:15 PM David Mertz <mertz at gnosis.cx> wrote:
> I see the utility, but I would prefer a slightly different approach than
> you suggest; I think my suggestion will have a lower barrier to acceptance
> as well.
>
> Rather than add a new classmethod dict.grouper(), I'd like to have a new
> dict subclass collections.Grouper. The name subject to bikeshedding, of
> course. I think of this class as a "big sister" of collections.Counter, in
> a way.
>
> There is behavior that I believe would be useful beyond constructing a new
> base dictionary. However, I think that construction from an iterable would
> be a common use pattern. Oh, I'd also recommend following toolz.groupby()
> in keeping a list rather than a set. It's easy enough to convert a list to
> a set if wanted, but order and repetitions are preserved in SQL or Pandas
> 'groupby' operations, and that seems more general.
>
> For example (this typed without testing, forgive any typos or thinkos):
>
> >>> from collections import Grouper # i.e. in Python 3.8+
> >>> grouped = Grouper(range(7), key=mod_2)
> >>> grouped
> Grouper({0: [0, 2, 4, 6], 1: [1, 3, 5]})
> >>> grouped.update([2, 10, 12, 13], key=mod_2)
> >>> grouped
> Grouper({0: [0, 2, 4, 6, 2, 10, 12], 1: [1, 3, 5, 13]})
> >>> # Updating with no key function groups by identity
> >>> # ... is there a better idea for the default key function?
> >>> grouped.update([0, 1, 2])
> >>> grouped
> Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0], 1: [1, 3, 5, 13, 1], 2: [2]})
> >>> # Maybe do a different style of update if passed a dict subclass
> >>> # - Does a key function make sense here?
> >>> grouped.update({0: 88, 1: 77})
> >>> grouped
> Grouper({0: [0, 2, 4, 6, 2, 10, 12, 0, 88],
> 1: [1, 3, 5, 13, 1, 77],
> 2: [2]})
> >>> # Avoiding duplicates might sometimes be useful
> >>> grouped.make_unique() # better name? .no_dup()?
> >>> grouped
> Grouper({0: [0, 2, 4, 6, 10, 12, 88],
> 1: [1, 3, 5, 13, 77],
> 2: [2]})
>
> I think that most of the methods of Counter make sense to include here in
> appropriately adjusted versions. Converting to a plain dictionary should
> probably just be `dict(grouped)`, but it's possible we'd want
> `grouped.as_dict()` or something.
>
> One thing that *might* be useful is a way to keep using the same key
> function across updates. Even with no explicit provision, we *could*
> spell it like this:
>
> >>> grouped.key_func = mod_2
> >>> grouped.update([55, 44, 22, 111], key=grouped.key_func)
>
> Perhaps some more official API for doing that would be useful though.
>
>
>
>
>
> On Thu, Jun 28, 2018 at 7:35 PM David Mertz <mertz at gnosis.cx> wrote:
>
>> Thanks... Looking now. I'll comment soon.
>>
>> On Thu, Jun 28, 2018 at 7:05 PM Michael Selik <mike at selik.org> wrote:
>>
>>> Hi David,
>>>
>>> We talked about this in Seattle about a year ago at a conference. Would
>>> you do me a favor and critique this PEP I've drafted? I'd like to get
>>> private feedback before sharing with the group.
>>>
>>> https://github.com/selik/peps/blob/master/pep-9999.rst
>>>
>>> Thank you,
>>> -- Michael
>>>
>>>
>>> On Thu, Jun 28, 2018 at 1:35 PM David Mertz <mertz at gnosis.cx> wrote:
>>>
>>>> I agree with these recommendations. There are excellent 3rd party tools
>>>> that do what you want. This is way too much to try to shoehorn into a
>>>> comprehension.
>>>>
>>>> I'd add one more option. You want something that behaves like SQL.
>>>> Right in the standard library is sqlite3, and you can create an in-memory
>>>> DB to hope the data you expect to group.
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180629/f61d7c15/attachment-0001.html>
More information about the Python-ideas
mailing list