[Python-ideas] Exporting dict Items for direct lookups of specific keys

Eyal Lotem eyal.lotem at gmail.com
Sun Jun 10 11:30:42 CEST 2007


On 6/10/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> >
> > I believe that it is possible to add a useful feature to dicts, that
> > will enable some interesting performance features in the future.
> >
> > Dict lookups are in general supposed to be O(1) and they are indeed very fast.
> > However, there are several drawbacks:
> > A. O(1) is not always achieved, collisions may still occur
>
> Note that O(1) is constant, not necessarily 1.  Assuming that the hash
> function that Python uses is decent (it seems to work well), then as per
> the load factor of 2/3, then you get an expected number of probes = 1 +
> (2/3)^2 + (2/3)^3 + (2/3)^4 + ..., which sums to 3.
>
> Now, if you have contents that are specifically designed to screw the
> hash function, then you are going to get poor performance.  But this is
> the case for any particular hash function; there exists inputs that
> force it to perform poorly.
Ofcourse, though it is an interesting anecdote because it won't screw
the lookups in the solution I'm describing.

> Given the above load factor, if 3 expected probes is too many, you can
> use d.update(d) to double the size of the dictionary, forcing the load
> factor to be 1/3 or less, for an expected number of probes = 1.5 .
>
>
> > B. Some lookups use static "literal" keys, and could benefit from
> > accessing a dict item directly (very fast O(1) _worst_ case, without
> > even a function call when done from the C side).
>
> [snip]
>
> What you are proposing is to add a level of indirection between some
> pointer in the dictionary to some special PyDictItem object. This will
> slow Python execution when such a thing is used.  Why? The extra level
> of indirection requires another pointer following, as well as a
> necessary check on the bitmap you are proposing, nevermind the
> additional memory overhead of having the item (generally doubling the
> size of dictionary objects that use such 'exported' items).
>
> You don't mention what algorithm/structure will allow for the accessing
> of your dictionary items in O(1) time, only that after you have this
> bitmap and dictioanry item thingy that it will be O(1) time (where
> dictionaries weren't already fast enough natively). I don't believe you
> have fully thought through this, but feel free to post C or Python
> source that describes your algorithm in detail to prove me wrong.
Only access of exported items is O(1) time (when accessed via your
PyDictItem_obj->value), other items must be accessed normally and they
take just as much time (or as I explained and you reiterated, a tad
longer, as it requires a bitmap check and in the case of exported
items another dereference).

> You should note that Python's dictionary implementation has been tuned
> to work *quite well* for the object attribute/namespace case, and I
> would be quite surprised if anyone managed to improve upon Raymond's
> work (without writing platform-specific assembly).

Ofcourse - the idea is not to improve dict's performance with the
normal way it is accessed, but to change the way it is accessed for
the specific use-case of accessing static values in a static dict -
which can be faster than even a fast dict lookup.

The dict lookups in globals, builtins are all looking for literal
static keys in a literal static dict. In this specific case, it is
better to outdo the existing dict performance, by adding a special way
to access such static keys in dicts - which insignificantly slows down
access to the dict, but significantly speeds up this very common use
pattern.
Attribute lookups in the class dict are all literal/static key lookups
in a static dict (though in order for a code object to know that it is
a static dict, a psyco-like system is required. If such a system is
used, all of those dict lookups can be made faster as well).



More information about the Python-ideas mailing list