Pre-PEP: Dictionary accumulator methods

Roose b at b.b
Fri Mar 18 21:46:57 EST 2005


I like this, it is short, low impact, and makes things more readable.  I
tend to go with just the literal way of doing it instead of using get and
setdefault, which I find awkward.

But alas I had a my short, low impact, useful suggestion and I think it
died.  It was for any() and all() for lists.  Actually Google just released
their "functional.py" module on code.google.com with the exact same thing.
Except they are missing the identity as a default which is very useful, i.e.
any(lst, f=lambda x: x) instead of any(lst, f).

Maybe you can tack that onto your PEP :)

That is kind of related, they are accumulators as well.  They could probably
be generalized for dictionaries, but I don't know how useful that would be.

"Raymond Hettinger" <vze4rx4y at verizon.net> wrote in message
news:JbL_d.8237$qN3.2116 at trndny01...
> I would like to get everyone's thoughts on two new dictionary methods:
>
>         def count(self, value, qty=1):
>             try:
>                 self[key] += qty
>             except KeyError:
>                 self[key] = qty
>
>         def appendlist(self, key, *values):
>             try:
>                 self[key].extend(values)
>             except KeyError:
>                 self[key] = list(values)
>
> The rationale is to replace the awkward and slow existing idioms for
dictionary
> based accumulation:
>
>     d[key] = d.get(key, 0) + qty
>     d.setdefault(key, []).extend(values)
>
> In simplest form, those two statements would now be coded more readably
as:
>
>    d.count(key)
>    d.appendlist(key, value)
>
> In their multi-value forms, they would now be coded as:
>
>   d.count(key, qty)
>   d.appendlist(key, *values)
>
> The error messages returned by the new methods are the same as those
returned by
> the existing idioms.
>
> The get() method would continue to exist because it is useful for
applications
> other than accumulation.
>
> The setdefault() method would continue to exist but would likely not make
it
> into Py3.0.
>
>
> PROBLEMS BEING SOLVED
> ---------------------
>
> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.
> * Their wording tends to hide the real meaning (accumulation).
> * The meaning of setdefault() 's method name is not self-evident.
>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.
> * The get() idiom requires two dictionary lookups of the same key.
> * The setdefault() idiom instantiates a new, empty list prior to every
call.
> * That new list is often not needed and is immediately discarded.
> * The setdefault() idiom requires an attribute lookup for extend/append.
> * The setdefault() idiom makes two function calls.
>
> The latter issues are evident from a disassembly:
>
> >>> dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec'))
>   1           0 LOAD_NAME                0 (d)
>               3 LOAD_ATTR                1 (get)
>               6 LOAD_NAME                2 (key)
>               9 LOAD_CONST               0 (0)
>              12 CALL_FUNCTION            2
>              15 LOAD_NAME                3 (qty)
>              18 BINARY_ADD
>              19 LOAD_NAME                0 (d)
>              22 LOAD_NAME                2 (key)
>              25 STORE_SUBSCR
>              26 LOAD_CONST               1 (None)
>              29 RETURN_VALUE
>
> >>> dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))
>   1           0 LOAD_NAME                0 (d)
>               3 LOAD_ATTR                1 (setdefault)
>               6 LOAD_NAME                2 (key)
>               9 BUILD_LIST               0
>              12 CALL_FUNCTION            2
>              15 LOAD_ATTR                3 (extend)
>              18 LOAD_NAME                4 (values)
>              21 CALL_FUNCTION            1
>              24 POP_TOP
>              25 LOAD_CONST               0 (None)
>              28 RETURN_VALUE
>
> In contrast, the proposed methods use only a single attribute lookup and
> function call, they use only one dictionary lookup, they use very few
opcodes,
> and they directly access the accumulation functions, PyNumber_Add() or
> PyList_Append().  IOW, the performance improvement matches the readability
> improvement.
>
>
> ISSUES
> ------
>
> The proposed names could possibly be improved (perhaps tally() is more
active
> and clear than count()).
>
> The appendlist() method is not as versatile as setdefault() which can be
used
> with other object types (perhaps for creating dictionaries of
dictionaries).
> However, most uses I've seen are with lists.  For other uses, plain Python
code
> suffices in terms of speed, clarity, and avoiding unnecessary
instantiation of
> empty containers:
>
>     if key not in d:
>         d.key = {subkey:value}
>     else:
>         d[key][subkey] = value
>
>
>
> Raymond Hettinger
>
>





More information about the Python-list mailing list