[Python-ideas] Exporting dict Items for direct lookups of specific keys

Josiah Carlson jcarlson at uci.edu
Sun Jun 10 10:00:20 CEST 2007


"Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> 
> I believe that it is possible to add a useful feature to dicts, that
> will enable some interesting performance features in the future.
> 
> Dict lookups are in general supposed to be O(1) and they are indeed very fast.
> However, there are several drawbacks:
> A. O(1) is not always achieved, collisions may still occur

Note that O(1) is constant, not necessarily 1.  Assuming that the hash
function that Python uses is decent (it seems to work well), then as per
the load factor of 2/3, then you get an expected number of probes = 1 +
(2/3)^2 + (2/3)^3 + (2/3)^4 + ..., which sums to 3.  

Now, if you have contents that are specifically designed to screw the
hash function, then you are going to get poor performance.  But this is
the case for any particular hash function; there exists inputs that
force it to perform poorly.

Given the above load factor, if 3 expected probes is too many, you can
use d.update(d) to double the size of the dictionary, forcing the load
factor to be 1/3 or less, for an expected number of probes = 1.5 .


> B. Some lookups use static "literal" keys, and could benefit from
> accessing a dict item directly (very fast O(1) _worst_ case, without
> even a function call when done from the C side).

[snip]

What you are proposing is to add a level of indirection between some
pointer in the dictionary to some special PyDictItem object. This will
slow Python execution when such a thing is used.  Why? The extra level
of indirection requires another pointer following, as well as a
necessary check on the bitmap you are proposing, nevermind the
additional memory overhead of having the item (generally doubling the
size of dictionary objects that use such 'exported' items).

You don't mention what algorithm/structure will allow for the accessing
of your dictionary items in O(1) time, only that after you have this
bitmap and dictioanry item thingy that it will be O(1) time (where
dictionaries weren't already fast enough natively). I don't believe you
have fully thought through this, but feel free to post C or Python
source that describes your algorithm in detail to prove me wrong.


You should note that Python's dictionary implementation has been tuned
to work *quite well* for the object attribute/namespace case, and I
would be quite surprised if anyone managed to improve upon Raymond's
work (without writing platform-specific assembly).


 - Josiah




More information about the Python-ideas mailing list