[Python-Dev] Python 2.3 release schedule
Oren Tirosh
oren-py-d@hishome.net
Sat, 25 May 2002 16:34:39 +0300
On Sat, May 25, 2002 at 08:17:47AM -0400, Guido van Rossum wrote:
> Two observations:
>
> - Your benchmarks use an almost empty dict (for locals and globals at
> least). I'd like to see how they perform with a realistic number of
> other names in the dict
True, the fastest code path (the macro) only works as long as the entry is
in the first hash position. For the tests in my posting this is 100% of
the cases. For real code the results are around 75%. To enable the display
of dictionary statistics on exit compile with -dSHOW_DICTIONARY_STATS.
One problem is that 75% is still not good enough - the fallback code path is
significantly slower (although still faster than PyDict_GetItem). Another
problem is that if you get a hash collision for a global symbol used inside
a tight loop the hit rate can drop closer to 0%.
One trick that may help is to shuffle the hash entries - for every 100th
time the macro fails the entry will be moved up to the first hash position
and the entry which previously occupied that position will be moved to the
first empty hash position for its own hash chain. Statistically, this will
ensure that the most commonly referenced names will tend to stay at the first
hash position. I think it may improve the hit rate from 75% to 85% or higher
and eliminate the worst-case scenario.
> - I'm worried that negative dict entries could fill the dict beyond
> its capacity.
For each dictionary, the number of negative entries is bound by the number
of global and builtin names in the co_names of all code objects that get
executed with that dictionary as their namespace.
For general-purpose dictionaries, of course, there is no such upper bound
and therefore this optimization cannot be used in PyDict_GetItem.
Oren