Pre-PEP: Dictionary accumulator methods

Sun Mar 20 14:49:13 EST 2005

Hi,

I really do not like it. So -1 for me. Your two methods are very specialized 
whereas the dict type is very generic. Usually, when I see something like 
this in code, I can smell it's a patch to overcome some shortcomings on a 
previous design, thereby making the economy of re-designing. Simply said, 
that's bad programming.

After that patch to provide a solution for only two of the more common 
use-cases, you are nonetheless stucked with the old solution for all the 
other use-cases (what if the value type is another dictionary or some 
user-made class ?).

Here's an alternate solution which I think answers all of the problems you 
mentionned while being generic.

=== BEGIN SNAP

def update_or_another_great_name(self, key, createFunc, updtFunc):
  try:
    self[key] <<<= updtFunc(self[key])
      ## This is "slow" with Python "=" since the key has to be  searched 
      ## twice But the new built-in method just has to update the value the
      ## first time the key is found. Therefore speed should be ok.
    return True
  except KeyError:
    self[key] = createFunc()
    return false

## Now your two specialized methods can be easily written as :

## A built-in should be provided for this (if not already proposed) :
def identical(val):
  return val

def count(self, key, qty=1):
  self.update_or_another_great_name(key, identical, 
                                    partial(operator.add, qty))
  ## partial is coming from : http://www.python.org/peps/pep-0309.html
  ## Using only built-in function (assuming "identical") as arguments makes it
  ## ok for speed (I guess).

def appendlist(self, key, *values):
  self.update_or_another_great_name(key, 
                                    partial(list, values),
                                    partial(ListType.extend, X = values))
  ## The first "partial" usage here is an abuse just to make sure that the
  ## list is not actually constructed before needed. It should work.
  ## The second usage is more uncertain as we need to bind the arguments from
  ## the right. Therefore I have to use the name of the parameter and I am not
  ## sure if there's one. As this list is very prolific, someone might have an
  ## idea on how to improve this.

=== END SNAP

By using only built-in constructs, this should be fast enough. Otherwise, 
optimizing these built-ins is a much more clean and sane way of thinking then 
messing the API with ad-hoc propositions.

Reviewing the problems you mention :

> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.

The method update_or_another_great_name is easy to understand, I think. But it 
might not always be easy to use it efficiently with built-ins. But this is 
always the case. "Recipees" can be added to show how to efficiently use the 
method.

> * Their wording tends to hide the real meaning (accumulation).

Solved.

> * The meaning of setdefault() 's method name is not self-evident.

Solved.

>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.

I really don't know what will be the outcome of the solution I propose. I 
certainly do not know anything about how my Python code translates into 
opcodes.

> * The get() idiom requires two dictionary lookups of the same key.

Solved

> * The setdefault() idiom instantiates a new, empty list prior to every

Solved

> call. * That new list is often not needed and is immediately discarded.

Solved

> * The setdefault() idiom requires an attribute lookup for extend/append.

Solved

> * The setdefault() idiom makes two function calls.

Solved

And perhaps, what you say here is also true for your two special use-cases :

> For other
> uses, plain Python code suffices in terms of speed, clarity, and avoiding
> unnecessary instantiation of empty containers:
>
>     if key not in d:
>         d.key = {subkey:value}
>     else:
>         d[key][subkey] = value

Much better than adding special cases on a generic class. Special cases always 
demultiply and if we open the door ....

Regards,

Francis Girard

Le samedi 19 Mars 2005 02:24, Raymond Hettinger a écrit :
> I would like to get everyone's thoughts on two new dictionary methods:
>
>         def count(self, value, qty=1):
>             try:
>                 self[key] += qty
>             except KeyError:
>                 self[key] = qty
>
>         def appendlist(self, key, *values):
>             try:
>                 self[key].extend(values)
>             except KeyError:
>                 self[key] = list(values)
>
> The rationale is to replace the awkward and slow existing idioms for
> dictionary based accumulation:
>
>     d[key] = d.get(key, 0) + qty
>     d.setdefault(key, []).extend(values)
>
> In simplest form, those two statements would now be coded more readably as:
>
>    d.count(key)
>    d.appendlist(key, value)
>
> In their multi-value forms, they would now be coded as:
>
>   d.count(key, qty)
>   d.appendlist(key, *values)
>
> The error messages returned by the new methods are the same as those
> returned by the existing idioms.
>
> The get() method would continue to exist because it is useful for
> applications other than accumulation.
>
> The setdefault() method would continue to exist but would likely not make
> it into Py3.0.
>
>
> PROBLEMS BEING SOLVED
> ---------------------
>
> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.
> * Their wording tends to hide the real meaning (accumulation).
> * The meaning of setdefault() 's method name is not self-evident.
>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.
> * The get() idiom requires two dictionary lookups of the same key.
> * The setdefault() idiom instantiates a new, empty list prior to every
> call. * That new list is often not needed and is immediately discarded.
> * The setdefault() idiom requires an attribute lookup for extend/append.
> * The setdefault() idiom makes two function calls.
>
> The latter issues are evident from a disassembly:
> >>> dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec'))
>
>   1           0 LOAD_NAME                0 (d)
>               3 LOAD_ATTR                1 (get)
>               6 LOAD_NAME                2 (key)
>               9 LOAD_CONST               0 (0)
>              12 CALL_FUNCTION            2
>              15 LOAD_NAME                3 (qty)
>              18 BINARY_ADD
>              19 LOAD_NAME                0 (d)
>              22 LOAD_NAME                2 (key)
>              25 STORE_SUBSCR
>              26 LOAD_CONST               1 (None)
>              29 RETURN_VALUE
>
> >>> dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))
>
>   1           0 LOAD_NAME                0 (d)
>               3 LOAD_ATTR                1 (setdefault)
>               6 LOAD_NAME                2 (key)
>               9 BUILD_LIST               0
>              12 CALL_FUNCTION            2
>              15 LOAD_ATTR                3 (extend)
>              18 LOAD_NAME                4 (values)
>              21 CALL_FUNCTION            1
>              24 POP_TOP
>              25 LOAD_CONST               0 (None)
>              28 RETURN_VALUE
>
> In contrast, the proposed methods use only a single attribute lookup and
> function call, they use only one dictionary lookup, they use very few
> opcodes, and they directly access the accumulation functions,
> PyNumber_Add() or PyList_Append().  IOW, the performance improvement
> matches the readability improvement.
>
>
> ISSUES
> ------
>
> The proposed names could possibly be improved (perhaps tally() is more
> active and clear than count()).
>
> The appendlist() method is not as versatile as setdefault() which can be
> used with other object types (perhaps for creating dictionaries of
> dictionaries). However, most uses I've seen are with lists.  For other
> uses, plain Python code suffices in terms of speed, clarity, and avoiding
> unnecessary instantiation of empty containers:
>
>     if key not in d:
>         d.key = {subkey:value}
>     else:
>         d[key][subkey] = value
>
>
>
> Raymond Hettinger