[Python-Dev] PyBench DictCreation (was Re: Performance compares)
Tim Peters
tim.one@home.com
Fri, 18 May 2001 15:48:33 -0400
[Neil Schemenauer]
> A table of keys and values. Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table). For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.
I thought about that before. The inlining appeals but the algorithm not
much: the dict implementation *as is* loops over all the table entries too,
except that instead of starting with "i = 0" it starts (now) with "i = hash &
mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^=
poly"; and instead of giving up when "i >= length" it punts when finding an
entry with a null value. Incrementing via ++i is certainly cheaper, except
that even when small, the hash table usually hits on the first try when the
key is present, so usually gets out before incrementing.
> Its not clear to me though if it would be a win.
Best guess is not.
> Assuming that interned strings are the most common key, a assocation
> table with four entries would take on average two pointer compares
> to look up a value.
Actually an average of 2.5 when the key is present and each key is equally
likely to be queried, and always 4 when the queried key is not present. The
hash table has better expected stats on both counts, but needs 4 unused slots
too to achieve that. The savings would be in memory for small dicts more
than in time (if at all).