- Your benchmarks use an almost empty dict (for locals and globals at least). I'd like to see how they perform with a realistic number of other names in the dict
True, the fastest code path (the macro) only works as long as the entry is in the first hash position. For the tests in my posting this is 100% of the cases. For real code the results are around 75%. To enable the display of dictionary statistics on exit compile with -dSHOW_DICTIONARY_STATS.
One problem is that 75% is still not good enough - the fallback code path is significantly slower (although still faster than PyDict_GetItem). Another problem is that if you get a hash collision for a global symbol used inside a tight loop the hit rate can drop closer to 0%.
One trick that may help is to shuffle the hash entries - for every 100th time the macro fails the entry will be moved up to the first hash position and the entry which previously occupied that position will be moved to the first empty hash position for its own hash chain. Statistically, this will ensure that the most commonly referenced names will tend to stay at the first hash position. I think it may improve the hit rate from 75% to 85% or higher and eliminate the worst-case scenario.
Piling more complexity on is likely to slow down the common case though.
- I'm worried that negative dict entries could fill the dict beyond its capacity.
For each dictionary, the number of negative entries is bound by the number of global and builtin names in the co_names of all code objects that get executed with that dictionary as their namespace.
And that's my worry. Suppose a program has only one global but touches every builtin. Will the dictionary properly get resized to accommodate all those negative entries? --Guido van Rossum (home page: http://www.python.org/~guido/)