[Python-ideas] Fwd: grouping / dict of lists

Chris Barker chris.barker at noaa.gov
Fri Jul 13 15:16:08 EDT 2018


On Fri, Jul 13, 2018 at 12:38 PM, Michael Selik <mike at selik.org> wrote:

> Thanks for linking to these.
>

yup -- real use cases are really helpful.

Though the other paradigm for grouping is use of setdefault() rather than
defaultdict. So it would be nice to look for those, too.


> I looked at many of them in my own research, but for some reason didn't
> think to write down the links. I'll respond to each one separately.
>


> Throughout, I'm going to use my proposed ``grouped`` builtin to
> demonstrate possible revisions. Note that I am *not* suggesting a
> replacement to defaultdict. The proposal is to make a common task easier
> and more reliable. It does not satisfy all uses of defaultdict.
>

agreed -- and it shouldn't.

I"d like to see how some of these pan our with my proposed API:

either a Grouped class, or at least (key, value) iterables and/or a value
function.

I don't have time now to do them all, but for the moment:

I noticed recently that *all* examples for collection.defaultdict (
>> https://docs.python.org/3.7/library/collections.html#
>> collections.defaultdict) are cases of grouping (for an int, a list and a
>> set) from an iterator with a key, value output.
>>
>
and yet others on this thread think a (key, value) input would be rare -- I
guess it depends on whether you are thinking dict-like already....


>
>> https://frama.link/o3Hb3-4U,
>>
>
>     accum = defaultdict(list)
>     garbageitems = []
>
>     for item in root:
>         filename = findfile(opts.fileroot, item.attrib['classname'])
>         accum[filename].append(float(item.attrib['time']))
>         if filename is None:
>             garbageitems.append(item)
>
>
> This might be more clear if separated into two parts.
>
>     def keyfunc(item):
>         return findfile(opts.fileroot, item.attrib['classname'])
>     groups = grouped(root, keyfunc)
>     groups = {k: [float(v.attrib['time']) for v in g] for k, g in
> groups.items()}
>     garbage = groups.pop(None, [])
>

so this one is a prime case for a value function -- I think post-processing
the groups is a pretty common case -- why make people post-process it?

    def keyfunc(item):
        return findfile(opts.fileroot, item.attrib['classname'])
   def valuefunc(item):
        float(item.attrib['time'])
    groups = grouped(root, keyfunc, valuefunc)
    garbage = groups.pop(None, [])

And the post-processing is then mixing comprehension style with key
function style (what to call that -- "functional" style?), so why not use a
(key, value) iterable:

groups = grouped((findfile(opts.fileroot, item.attrib['classname']),
                  item.attrib['time'])
                  for item in root))

OK -- that's packing a bit too much into a line, so how about:

def keyfunc(item):
        return findfile(opts.fileroot, item.attrib['classname'])

groups = grouped( (keyfunc(item), item.attrib['time']) for item in root)

>
>     self.mapping = collections.defaultdict(set)
>     for op in (op for op in graph.get_operations()):
>       if op.name.startswith(common.SKIPPED_PREFIXES):
>         continue
>       for op_input in op.inputs:
>         self.mapping[op_input].add(op)
>
>
> This is a case of a single element being added to multiple groups, which
> is your section B, below. The loop and filter could be better. It looks
> like someone intended to convert if/continue to a comprehension, but
> stopped partway through the revision.
>

yeah, this is weird --

But it does make a case for having a class with the option f using a set to
collect (which I have in an older commit of my prototype:

    inputs = ((op_input, op) for op in ops for op_input in op.inputs)
    groups = Grouping(inputs, key=itemgetter(0), collection=set)

otherwise, you could have a method to do it:
    groups.map_on_groups(set)

(not sure I like that method name, but I hope you get the idea)

OK, back to work.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180713/4f77b60a/attachment-0001.html>


More information about the Python-ideas mailing list