[Python-ideas] [Python-Dev] hello, new dict addition for new eve ?
goodman.m.w at gmail.com
goodman.m.w at gmail.com
Mon Jan 2 23:49:16 CET 2012
It's funny, I joined this list several days ago to ask about this very
problem. Perhaps I can add something (there's code below if you want
to skip the discussion):
I do research in computational linguistics, and adding/incrementing
dictionary values is a very common task (e.g. counting unique words in
a corpus). A defaultdict and a for loop have served me well toward
this end (and Counter may be nice, but I haven't tried it), but I've
always wanted something more functional. I found in Haskell a nice
method Map.fromListWith that takes a list of (key, value) pairs and a
function describing how to deal with collisions. It's useful and
flexible.
Summing up what has been said, as I understand it, it's not obvious
what a good behaviour for + is with dicts, and we can't assure it
follows algebraic properties like commutativity. As Guido hinted at in
the first messages, addition over disjoint dicts (and we can include
the identity dict) should be the same no matter the operation. That
is:
{k1:v1} + {k2:v2} = {k1:v1, k2:v2}
{k1:v1} + {} = {k1:v1}
The only interesting cases are when we have the same key:
{k1:v1} + {k1:v2} = {k1:op(v1,v2)}
where op is some binary function.
While dict doesn't have __add__ defined, it does resolve conflicts by
just using the latter value (i.e. op=lambda x, y: y), as in the
following three statements:
d = dict([(k1,v1), (k1,v2)]) # result: {k1:v2}
d[k1] = v3 # result: {k1:v3}
d.update([(k1,v4) # result: {k1:v4}
I think the latter two, at least, should not be counted as addition,
but rather stay as assignment. Someone brought up the idea of a
__collision__ method that is called when __setitem__ is called on an
existing key. I like this idea, and there's probably some use for it,
but for implementing addition it might not be practical (how, then,
would you re-assign a key's value without forcing "op" to apply on the
new and existing value?).
I propose something like the following proof-of-concept:
class AccumulationDict(dict):
def __init__(self, accumulator, *args, **kwargs):
if not callable(accumulator):
raise TypeError('Accumulator must be callable.')
self.accumulator = accumulator
self.accumulate(*args, **kwargs)
def __additem__(self, key, value):
if key in self:
self[key] = self.accumulator(self[key], value)
else:
self[key] = value
def __add__(self, other):
result = AccumulationDict(self.accumulator, self) # check if
other's accumulator is same?
result.accumulate(other)
return result
def accumulate(self, *args, **kwargs):
for arg in args:
if isinstance(arg, list):
for (key, value) in arg:
self.__additem__(key, value)
elif isinstance(arg, dict):
for (key, value) in arg.items():
self.__additem__(key, value)
else:
raise TypeError('Argument must be of type list or dict.')
for key in kwargs:
self.__additem__(key, kwargs[key])
Which has the following properties:
* takes a binary function for accumulating existing and new values
* otherwise can be instantiated like a dict
* accumulate() function is like update(), but uses the accumulator to
deal with conflicts
* KeyErrors still occur when getting a non-existent key
* addition is supported and returns a new object with the merged dicts
* __setitem__ and update() are unchanged so they can be used to assign values
>>> d = AccumulationDict(operator.add, [('a',1),('b',1),('a',1)], b=1)
>>> d
{'a': 2, 'b': 2}
>>> d['a'] = 0
>>> d
{'a': 0, 'b': 2}
>>> d + {'a':3,'b':1}
{'a': 3, 'b': 3}
>>> d
{'a': 0, 'b': 2}
>>> d['c']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'c'
>>> d.update({'c':1})
>>> d
{'a': 0, 'c': 1, 'b': 2}
>>> d.accumulate({'c':1})
>>> d
{'a': 0, 'c': 2, 'b': 2}
I hope this contributes to the discussion, and I'd be happy to hear
your thoughts.
Thanks,
--
-Michael Wayne Goodman
More information about the Python-ideas
mailing list