Re: [Python-Dev] Python 2.3 release schedule

25 May 2002


      ...
...
- Your benchmarks use an almost empty dict (for locals and globals at
  least).  I'd like to see how they perform with a realistic number of
  other names in the dict
True, the fastest code path (the macro) only works as long as the
entry is in the first hash position.  For the tests in my posting
this is 100% of the cases.  For real code the results are around
75%.  To enable the display of dictionary statistics on exit compile
with -dSHOW_DICTIONARY_STATS.
One problem is that 75% is still not good enough - the fallback code
path is significantly slower (although still faster than
PyDict_GetItem).  Another problem is that if you get a hash
collision for a global symbol used inside a tight loop the hit rate
can drop closer to 0%.
One trick that may help is to shuffle the hash entries - for every
100th time the macro fails the entry will be moved up to the first
hash position and the entry which previously occupied that position
will be moved to the first empty hash position for its own hash
chain.  Statistically, this will ensure that the most commonly
referenced names will tend to stay at the first hash position. I
think it may improve the hit rate from 75% to 85% or higher and
eliminate the worst-case scenario.
Piling more complexity on is likely to slow down the common case though.
...
...
- I'm worried that negative dict entries could fill the dict beyond
  its capacity.
For each dictionary, the number of negative entries is bound by the
number of global and builtin names in the co_names of all code
objects that get executed with that dictionary as their namespace.
And that's my worry.  Suppose a program has only one global but
touches every builtin.  Will the dictionary properly get resized to
accommodate all those negative entries?

--Guido van Rossum (home page: http://www.python.org/~guido/)