
Jim Jewett wrote:
On 11/22/07, Neil Toronto ntoronto@cs.byu.edu wrote:
... What if a frame could maintain an array of pointers right into a dictionary's entry table?
The dict notifies its observers on delitem, pop, popitem, resize and clear. Nothing else is necessary - nothing else will change the address of or invalidate an entry.
I think this isn't quite true, because of DUMMY entries.
Insert key1. Insert key2 that wants the same slot. Register an observer that cares about key2 but not key1.
Delete key1. The key1 entry is replaced with DUMMY, but the entry for key2 is not affected.
Look up key2 (by some other code which hasn't already taken this shortcut) and the lookdict function (as a side effect) moves key2 to the better location that key1 no longer occupies. As described, I think this breaks your cache.
Good grief old chap, you freaked me out.
Turns out it all still works. Whether the lookdict functions used to move entries around I don't know, but now it doesn't. It's probably because deletions are so rare compared to other operations that it's not worth the extra logic in those tight little loops.
Mind if I keep rambling, just to make sure I've got it right? :)
It's the dummy entries that make lookup work at all. The lookdict functions use them as flags so that it knows to keep skipping around the table looking for an open entry or an entry with the right key. It's basically: "If ep->me_key != key or ep->me_key == dummy, I need to keep trying different ep's. If I reach an empty ep, return the first dummy I found or that ep if I didn't find one. If I reach an ep with the right key, return that."
I wasn't completely satisfied by static analysis, so I traced the case you brought up through both lookdict and lookdict_string. Here it is:
Assume hash(key1) == hash(key2). Assume (without loss of generality) that for this hash, entries are traversed in order 0, 1, 2...
Insert key1:
0: key1 1: NULL 2: ...
Insert key2:
0: key1 1: key2 2: ...
Delete key1:
0: dummy 1: key2 2: ...
Look up key2 (trace):
("freeslot" keeps track of the first dummy found on the traversal; is NULL if none found)
start: ep = 0 freeslot = 0 [ep->me_key == dummy] loop: ep = 1 return ep [ep->me_key == key2]
In that last bit would have been the part that goes something like this:
if (freeslot != NULL) { /* refcount-neutral */ *freeslot = *ep; ep->me_key = dummy; ep->me_value = NULL; return freeslot; } else return ep;
It might be a speed improvement if you assume that the key is very likely to be looked up again. But it's extra complexity in a speed-critical code path and you never know whether you lengthened the traversal for other lookups.
As long as it's a wash in the end, it might as well be left alone, at least for the fast globals. :D
Neil