
On 10/28/10, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 28 Oct 2010 19:58:59 +0200 spir <denis.spir@gmail.com> wrote:
What does the current implementation use as buckets?
It uses an open addressing strategy. Each dict entry holds three pointer-sized fields: key object, value object, and cached hash value of the key. (set entries have only two fields, since they don't hold a value object)
Has anyone benchmarked not storing the hash value here? For a string dict, that hash should already be available on the string object itself, so it is redundant. Keeping it obviously improves cache locality, but ... it also makes the dict objects 50% larger, and there is a chance that the strings themselves would already be in cache anyhow. And if strings were reliably interned, the comparison check should normally just be a pointer compare -- possibly fast enough that the "different hash" shortcut doesn't buy anything. [caveats about still needing to go to the slower dict implementation for string subclasses] -jJ