[Python-ideas] Fast global cacheless lookup

Jim Jewett jimjjewett at gmail.com
Sun Nov 25 00:22:04 CET 2007


On 11/22/07, Neil Toronto <ntoronto at cs.byu.edu> wrote:

> ... What if a frame could maintain an
> array of pointers right into a dictionary's entry table?

> The dict notifies its observers on delitem, pop, popitem, resize and
> clear. Nothing else is necessary - nothing else will change the address
> of or invalidate an entry.

I think this isn't quite true, because of DUMMY entries.

Insert key1.
Insert key2 that wants the same slot.
Register an observer that cares about key2 but not key1.

Delete key1.   The key1 entry is replaced with DUMMY, but the entry
for key2 is not affected.

Look up key2 (by some other code which hasn't already taken this
shortcut) and the lookdict function (as a side effect) moves key2 to
the better location that key1 no longer occupies.  As described, I
think this breaks your cache.

Of course you can get around this by just not moving things without a
resize, but that is likely to be horrible for the (non-namespace?)
dictionaries that do see frequent deletions.

Another way around it is to also notify the observers whenever
lookdict moves an entry; I'm not sure how that would affect normal
lookup performance.

A more radical change is to stop exposing the internal structure at
all.  For example, a typical namespace might instead be represented as
an array of values, plus a dict mapping names to indices.  The cost
would be an extra pointer for each key ever in the dictionary (since
you wouldn't reuse positional slots), and the savings would be that
most lookups could just grab namespace[i] without having to even check
that they got the right key, let alone following a trail of collision
resolutions.

> To speed up globals access, an auxiliary object to functions and frames
> registers itself as an observer to func_globals and __builtins__.

Note that func_globals probably *will* be updated again in the future,
if only to register this very function with its module.  You could
wait to "seal" a namespace until you think all its names are known, or
you could adapt the timestamp solution suggested in
http://bugs.python.org/issue1616125

-jJ



More information about the Python-ideas mailing list