On 6/10/07, Josiah Carlson firstname.lastname@example.org wrote:
"Eyal Lotem" email@example.com wrote:
I believe that it is possible to add a useful feature to dicts, that will enable some interesting performance features in the future.
Dict lookups are in general supposed to be O(1) and they are indeed very fast. However, there are several drawbacks: A. O(1) is not always achieved, collisions may still occur
Note that O(1) is constant, not necessarily 1. Assuming that the hash function that Python uses is decent (it seems to work well), then as per the load factor of 2/3, then you get an expected number of probes = 1 + (2/3)^2 + (2/3)^3 + (2/3)^4 + ..., which sums to 3.
Now, if you have contents that are specifically designed to screw the hash function, then you are going to get poor performance. But this is the case for any particular hash function; there exists inputs that force it to perform poorly.
Ofcourse, though it is an interesting anecdote because it won't screw the lookups in the solution I'm describing.
Given the above load factor, if 3 expected probes is too many, you can use d.update(d) to double the size of the dictionary, forcing the load factor to be 1/3 or less, for an expected number of probes = 1.5 .
B. Some lookups use static "literal" keys, and could benefit from accessing a dict item directly (very fast O(1) _worst_ case, without even a function call when done from the C side).
What you are proposing is to add a level of indirection between some pointer in the dictionary to some special PyDictItem object. This will slow Python execution when such a thing is used. Why? The extra level of indirection requires another pointer following, as well as a necessary check on the bitmap you are proposing, nevermind the additional memory overhead of having the item (generally doubling the size of dictionary objects that use such 'exported' items).
You don't mention what algorithm/structure will allow for the accessing of your dictionary items in O(1) time, only that after you have this bitmap and dictioanry item thingy that it will be O(1) time (where dictionaries weren't already fast enough natively). I don't believe you have fully thought through this, but feel free to post C or Python source that describes your algorithm in detail to prove me wrong.
Only access of exported items is O(1) time (when accessed via your PyDictItem_obj->value), other items must be accessed normally and they take just as much time (or as I explained and you reiterated, a tad longer, as it requires a bitmap check and in the case of exported items another dereference).
You should note that Python's dictionary implementation has been tuned to work *quite well* for the object attribute/namespace case, and I would be quite surprised if anyone managed to improve upon Raymond's work (without writing platform-specific assembly).
Ofcourse - the idea is not to improve dict's performance with the normal way it is accessed, but to change the way it is accessed for the specific use-case of accessing static values in a static dict - which can be faster than even a fast dict lookup.
The dict lookups in globals, builtins are all looking for literal static keys in a literal static dict. In this specific case, it is better to outdo the existing dict performance, by adding a special way to access such static keys in dicts - which insignificantly slows down access to the dict, but significantly speeds up this very common use pattern. Attribute lookups in the class dict are all literal/static key lookups in a static dict (though in order for a code object to know that it is a static dict, a psyco-like system is required. If such a system is used, all of those dict lookups can be made faster as well).