[Python-Dev] Python 2.3 release schedule

Oren Tirosh oren-py-d@hishome.net
Sat, 25 May 2002 16:34:39 +0300

On Sat, May 25, 2002 at 08:17:47AM -0400, Guido van Rossum wrote:
> Two observations:
> - Your benchmarks use an almost empty dict (for locals and globals at
>   least).  I'd like to see how they perform with a realistic number of
>   other names in the dict

True, the fastest code path (the macro) only works as long as the entry is
in the first hash position.  For the tests in my posting this is 100% of 
the cases.  For real code the results are around 75%.  To enable the display 
of dictionary statistics on exit compile with -dSHOW_DICTIONARY_STATS.

One problem is that 75% is still not good enough - the fallback code path is 
significantly slower (although still faster than PyDict_GetItem).  Another 
problem is that if you get a hash collision for a global symbol used inside 
a tight loop the hit rate can drop closer to 0%.

One trick that may help is to shuffle the hash entries - for every 100th 
time the macro fails the entry will be moved up to the first hash position 
and the entry which previously occupied that position will be moved to the 
first empty hash position for its own hash chain.  Statistically, this will 
ensure that the most commonly referenced names will tend to stay at the first 
hash position. I think it may improve the hit rate from 75% to 85% or higher 
and eliminate the worst-case scenario.

> - I'm worried that negative dict entries could fill the dict beyond
>   its capacity.

For each dictionary, the number of negative entries is bound by the number 
of global and builtin names in the co_names of all code objects that get 
executed with that dictionary as their namespace.  

For general-purpose dictionaries, of course, there is no such upper bound
and therefore this optimization cannot be used in PyDict_GetItem.