try vs. has_key()

Fri Apr 23 09:16:25 EDT 1999

Aahz Maruch wrote:
> 
> I've seen roughly half the people here doing
> 
> try:
>    dict[key].append(foo)
> except:
>    dict[key]=[foo]
> 
> with the other half doing
> 
> if dict.has_key(key):
>    dict[key].append(foo)
> else:
>    dict[key]=[foo]
> 
> Can people explain their preferences?

Well, besides the other good answers, there's some more.

>From a design point of view, both versions have their place.
The has_key version reflects the expectation of many missing keys,
and it is not exceptional. The try..except version suggests
that you usually will find a key, while a non-existant key
is exceptional. 
I think, the dict.get variant is just a more efficient
compromize which is somewhere between.
It is also a matter of taste.

But on execution speed, there are the measures of Barry's paper,
and he is still right, although I would need measures for dict.get
as well.

Note that a loop running over dict.get calls is still a bit
slower than indexing the dict, due to some internal method
access overhead. It even applies when the dict.get method
is assigned to a local variable. Indexing is the fastest
way to access a dict element when you expect no key errors.

I have no exact measures yet, but will provide some, soon.

To optimize things if you can expect a very small number of
key failures, a different approach can help to get the
last cycles squeezed out of the code:
If it is possible for your algorithm, you can put the
try..except clause outside of some loop. 
This saves a statement and the exception clause setup as well.
Your code must be changed quite heavily, since it has to
restart the loop where it failed, but there are a number of
cases where I need this kind of optimization.

Here an example, which doesn't try to look well-designed,
but run as fats as possible. I recommend to provide
a second, more readable version of the function for
documentation.

Assuming you have code which runs in a loop and tries to collect
new elements as your code does, here a good, "nearly fast"
version, one with "get", and a faster, ugly one:

def count_many(dict, itemlist):
    for key, value in itemlist:
        if dict.has_key(key):
            dict[key].append(value)
        else:
            dict[key] = [value]

def count_many_get(dict, itemlist):
    for key, value in itemlist:
        lis = dict.get(key)
        if lis :
            lis.append(value)
        else:
            dict[key] = [value]

def count_many_fast(dict, itemlist):
    while 1:
        try:
            k = 1
            for key, value in itemlist:
                dict[key].append(value) ; k = k+1
            break
        except KeyError:
            dict[key] = [value]
            itemlist = itemlist[k:]

results = """
# given a list of 1000 tuples, and appending
# this 1000 times, we measure

>>> d={} ; print timing(count_many, (d, list) ,1000)[0]
17.8
>>> d={} ; print timing(count_many_get, (d, list) ,1000)[0]
16.15
>>> d={} ; print timing(count_many_fast, (d, list) ,1000)[0]
13.73
>>> 
"""

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home