[Python-Dev] Python 2.3 release schedule
Guido van Rossum
guido@python.org
Sat, 25 May 2002 10:11:26 -0400
> > - Your benchmarks use an almost empty dict (for locals and globals at
> > least). I'd like to see how they perform with a realistic number of
> > other names in the dict
>
> True, the fastest code path (the macro) only works as long as the
> entry is in the first hash position. For the tests in my posting
> this is 100% of the cases. For real code the results are around
> 75%. To enable the display of dictionary statistics on exit compile
> with -dSHOW_DICTIONARY_STATS.
>
> One problem is that 75% is still not good enough - the fallback code
> path is significantly slower (although still faster than
> PyDict_GetItem). Another problem is that if you get a hash
> collision for a global symbol used inside a tight loop the hit rate
> can drop closer to 0%.
>
> One trick that may help is to shuffle the hash entries - for every
> 100th time the macro fails the entry will be moved up to the first
> hash position and the entry which previously occupied that position
> will be moved to the first empty hash position for its own hash
> chain. Statistically, this will ensure that the most commonly
> referenced names will tend to stay at the first hash position. I
> think it may improve the hit rate from 75% to 85% or higher and
> eliminate the worst-case scenario.
Piling more complexity on is likely to slow down the common case though.
> > - I'm worried that negative dict entries could fill the dict beyond
> > its capacity.
>
> For each dictionary, the number of negative entries is bound by the
> number of global and builtin names in the co_names of all code
> objects that get executed with that dictionary as their namespace.
And that's my worry. Suppose a program has only one global but
touches every builtin. Will the dictionary properly get resized to
accommodate all those negative entries?
--Guido van Rossum (home page: http://www.python.org/~guido/)