Pre-PEP: Dictionary accumulator methods
Francis Girard
francis.girard at free.fr
Sun Mar 20 14:49:13 EST 2005
Hi,
I really do not like it. So -1 for me. Your two methods are very specialized
whereas the dict type is very generic. Usually, when I see something like
this in code, I can smell it's a patch to overcome some shortcomings on a
previous design, thereby making the economy of re-designing. Simply said,
that's bad programming.
After that patch to provide a solution for only two of the more common
use-cases, you are nonetheless stucked with the old solution for all the
other use-cases (what if the value type is another dictionary or some
user-made class ?).
Here's an alternate solution which I think answers all of the problems you
mentionned while being generic.
=== BEGIN SNAP
def update_or_another_great_name(self, key, createFunc, updtFunc):
try:
self[key] <<<= updtFunc(self[key])
## This is "slow" with Python "=" since the key has to be searched
## twice But the new built-in method just has to update the value the
## first time the key is found. Therefore speed should be ok.
return True
except KeyError:
self[key] = createFunc()
return false
## Now your two specialized methods can be easily written as :
## A built-in should be provided for this (if not already proposed) :
def identical(val):
return val
def count(self, key, qty=1):
self.update_or_another_great_name(key, identical,
partial(operator.add, qty))
## partial is coming from : http://www.python.org/peps/pep-0309.html
## Using only built-in function (assuming "identical") as arguments makes it
## ok for speed (I guess).
def appendlist(self, key, *values):
self.update_or_another_great_name(key,
partial(list, values),
partial(ListType.extend, X = values))
## The first "partial" usage here is an abuse just to make sure that the
## list is not actually constructed before needed. It should work.
## The second usage is more uncertain as we need to bind the arguments from
## the right. Therefore I have to use the name of the parameter and I am not
## sure if there's one. As this list is very prolific, someone might have an
## idea on how to improve this.
=== END SNAP
By using only built-in constructs, this should be fast enough. Otherwise,
optimizing these built-ins is a much more clean and sane way of thinking then
messing the API with ad-hoc propositions.
Reviewing the problems you mention :
> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.
The method update_or_another_great_name is easy to understand, I think. But it
might not always be easy to use it efficiently with built-ins. But this is
always the case. "Recipees" can be added to show how to efficiently use the
method.
> * Their wording tends to hide the real meaning (accumulation).
Solved.
> * The meaning of setdefault() 's method name is not self-evident.
Solved.
>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.
I really don't know what will be the outcome of the solution I propose. I
certainly do not know anything about how my Python code translates into
opcodes.
> * The get() idiom requires two dictionary lookups of the same key.
Solved
> * The setdefault() idiom instantiates a new, empty list prior to every
Solved
> call. * That new list is often not needed and is immediately discarded.
Solved
> * The setdefault() idiom requires an attribute lookup for extend/append.
Solved
> * The setdefault() idiom makes two function calls.
Solved
And perhaps, what you say here is also true for your two special use-cases :
> For other
> uses, plain Python code suffices in terms of speed, clarity, and avoiding
> unnecessary instantiation of empty containers:
>
> if key not in d:
> d.key = {subkey:value}
> else:
> d[key][subkey] = value
Much better than adding special cases on a generic class. Special cases always
demultiply and if we open the door ....
Regards,
Francis Girard
Le samedi 19 Mars 2005 02:24, Raymond Hettinger a écrit :
> I would like to get everyone's thoughts on two new dictionary methods:
>
> def count(self, value, qty=1):
> try:
> self[key] += qty
> except KeyError:
> self[key] = qty
>
> def appendlist(self, key, *values):
> try:
> self[key].extend(values)
> except KeyError:
> self[key] = list(values)
>
> The rationale is to replace the awkward and slow existing idioms for
> dictionary based accumulation:
>
> d[key] = d.get(key, 0) + qty
> d.setdefault(key, []).extend(values)
>
> In simplest form, those two statements would now be coded more readably as:
>
> d.count(key)
> d.appendlist(key, value)
>
> In their multi-value forms, they would now be coded as:
>
> d.count(key, qty)
> d.appendlist(key, *values)
>
> The error messages returned by the new methods are the same as those
> returned by the existing idioms.
>
> The get() method would continue to exist because it is useful for
> applications other than accumulation.
>
> The setdefault() method would continue to exist but would likely not make
> it into Py3.0.
>
>
> PROBLEMS BEING SOLVED
> ---------------------
>
> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.
> * Their wording tends to hide the real meaning (accumulation).
> * The meaning of setdefault() 's method name is not self-evident.
>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.
> * The get() idiom requires two dictionary lookups of the same key.
> * The setdefault() idiom instantiates a new, empty list prior to every
> call. * That new list is often not needed and is immediately discarded.
> * The setdefault() idiom requires an attribute lookup for extend/append.
> * The setdefault() idiom makes two function calls.
>
> The latter issues are evident from a disassembly:
> >>> dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec'))
>
> 1 0 LOAD_NAME 0 (d)
> 3 LOAD_ATTR 1 (get)
> 6 LOAD_NAME 2 (key)
> 9 LOAD_CONST 0 (0)
> 12 CALL_FUNCTION 2
> 15 LOAD_NAME 3 (qty)
> 18 BINARY_ADD
> 19 LOAD_NAME 0 (d)
> 22 LOAD_NAME 2 (key)
> 25 STORE_SUBSCR
> 26 LOAD_CONST 1 (None)
> 29 RETURN_VALUE
>
> >>> dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))
>
> 1 0 LOAD_NAME 0 (d)
> 3 LOAD_ATTR 1 (setdefault)
> 6 LOAD_NAME 2 (key)
> 9 BUILD_LIST 0
> 12 CALL_FUNCTION 2
> 15 LOAD_ATTR 3 (extend)
> 18 LOAD_NAME 4 (values)
> 21 CALL_FUNCTION 1
> 24 POP_TOP
> 25 LOAD_CONST 0 (None)
> 28 RETURN_VALUE
>
> In contrast, the proposed methods use only a single attribute lookup and
> function call, they use only one dictionary lookup, they use very few
> opcodes, and they directly access the accumulation functions,
> PyNumber_Add() or PyList_Append(). IOW, the performance improvement
> matches the readability improvement.
>
>
> ISSUES
> ------
>
> The proposed names could possibly be improved (perhaps tally() is more
> active and clear than count()).
>
> The appendlist() method is not as versatile as setdefault() which can be
> used with other object types (perhaps for creating dictionaries of
> dictionaries). However, most uses I've seen are with lists. For other
> uses, plain Python code suffices in terms of speed, clarity, and avoiding
> unnecessary instantiation of empty containers:
>
> if key not in d:
> d.key = {subkey:value}
> else:
> d[key][subkey] = value
>
>
>
> Raymond Hettinger
More information about the Python-list
mailing list