[Python-Dev] Python 2.3 release schedule

Sat, 25 May 2002 18:20:21 +0300

On Sat, May 25, 2002 at 10:11:26AM -0400, Guido van Rossum wrote:
> > One trick that may help is to shuffle the hash entries - for every
> > 100th time the macro fails the entry will be moved up to the first
> > hash position and the entry which previously occupied that position
> > will be moved to the first empty hash position for its own hash
> > chain.  Statistically, this will ensure that the most commonly
> > referenced names will tend to stay at the first hash position. I
> > think it may improve the hit rate from 75% to 85% or higher and
> > eliminate the worst-case scenario.
> 
> Piling more complexity on is likely to slow down the common case though.

Not at all. The common case (the macro) will not have a single instruction
added.  The rare case (fallback function) will be slowed down by two 
instructions: decrement a global variable, jump if zero.  The reshuffle will 
be triggered every 100th occurence of the rare case (which may be every 
10000th occurence of the common case).

It's the same approach used when designing a memory cache for a CPU: it's 
OK if a cache miss is made slower by loading an entire cache row as long as 
it improves the cache hit rate enough to justify it.

> > For each dictionary, the number of negative entries is bound by the
> > number of global and builtin names in the co_names of all code
> > objects that get executed with that dictionary as their namespace.
> 
> And that's my worry.  Suppose a program has only one global but
> touches every builtin.  Will the dictionary properly get resized to
> accommodate all those negative entries?

>>>len(dir(__builtins__))*24  # sizeof PyDictEntry on i386
2736

Caching of string hashes also costs memory. Speed optimizations often do.
These costs need to be weighed against complexity, performance, and even the 
code size - other alternatives may use up as much code space as this one 
uses data in this worst-case scenario.

	Oren