[Python-Dev] Python 2.3 release schedule

Sat, 25 May 2002 10:11:26 -0400

> > - Your benchmarks use an almost empty dict (for locals and globals at
> >   least).  I'd like to see how they perform with a realistic number of
> >   other names in the dict
> 
> True, the fastest code path (the macro) only works as long as the
> entry is in the first hash position.  For the tests in my posting
> this is 100% of the cases.  For real code the results are around
> 75%.  To enable the display of dictionary statistics on exit compile
> with -dSHOW_DICTIONARY_STATS.
> 
> One problem is that 75% is still not good enough - the fallback code
> path is significantly slower (although still faster than
> PyDict_GetItem).  Another problem is that if you get a hash
> collision for a global symbol used inside a tight loop the hit rate
> can drop closer to 0%.
> 
> One trick that may help is to shuffle the hash entries - for every
> 100th time the macro fails the entry will be moved up to the first
> hash position and the entry which previously occupied that position
> will be moved to the first empty hash position for its own hash
> chain.  Statistically, this will ensure that the most commonly
> referenced names will tend to stay at the first hash position. I
> think it may improve the hit rate from 75% to 85% or higher and
> eliminate the worst-case scenario.

Piling more complexity on is likely to slow down the common case though.

> > - I'm worried that negative dict entries could fill the dict beyond
> >   its capacity.
> 
> For each dictionary, the number of negative entries is bound by the
> number of global and builtin names in the co_names of all code
> objects that get executed with that dictionary as their namespace.

And that's my worry.  Suppose a program has only one global but
touches every builtin.  Will the dictionary properly get resized to
accommodate all those negative entries?

--Guido van Rossum (home page: http://www.python.org/~guido/)