[Python-Dev] Non-string keys in namespace dicts

Phillip J. Eby pje at telecommunity.com
Tue Dec 4 00:48:47 CET 2007

At 03:26 PM 12/3/2007 -0700, Neil Toronto wrote:
>Phillip J. Eby wrote:
> > At 12:27 PM 12/3/2007 -0700, Neil Toronto wrote:
> >> Guido van Rossum wrote:
> >> > How about subclasses of str? These have all the same issues...
> >>
> >> Yeah. I ended up having it, per class, permanently revert to uncached
> >> lookups when it detects that a class dict in the MRO has non-string
> >> keys. That's flagged by lookdict_string, which uses PyString_CheckExact.
> >
> > I'm a bit confused here.  Isn't the simplest way to cache attribute
> > lookups to just have a cache dictionary in the type, and update that
> > dictionary whenever a change is made to a superclass?  That's
> > essentially how __slotted__ attribute changes on base classes work now,
> > isn't it?  Why do we need to mess around with the dictionary entries
> > themselves in order to do that?
>The nice thing about caching pointers to dict entries is that they don't
>change as often as values do. There are fewer ways to invalidate an
>entry pointer: inserting set, resize, clear, and delete. If you cache
>values, non-inserting set could invalidate as well.
>Because inserting into namespace dicts should be very rare, caching
>entries rather than values should reduce the number of times cache
>entries are invalidated to near zero. Updating is expensive, so that's
>good for performance.
>Rare updating also means it's okay to invalidate the entire cache rather
>than single entries, so the footprint of the caching mechanism in the
>dict can be very small. For example, I've got a single 64-bit counter in
>each dict that gets incremented after every potentially invalidating
>operation. That comes down to 8 bytes of storage and two extra machine
>instructions (currently) per invalidating operation. The cache checks it
>against its own counter, and updating ensures that it's synced.
>Some version of the non-string keys problem would exist with any caching
>mechanism, though. An evil rich compare can always monkey about with
>class dicts in the MRO. If a caching scheme caches values and doesn't
>account for that, it could return stale values. If it caches entries and
>doesn't account for that, it could segfault. I suppose you could argue
>that returning stale values is fitting punishment for using an evil rich
>compare, though the punishee isn't always the same person as the punisher.

Actually, you're missing the part where such evil code *can't* muck 
things up for class dictionaries.  Type dicts aren't reachable via 
ordinary Python code; you *have* to modify them via setattr.  (The 
__dict__ of types returns a read-only proxy object, so the most evil 
rich compare you can imagine still can't touch it.)

This means that MRO cache invalidation can already be detected using 
"type"'s tp_setattro implementation.  And setting attributes on types 
is already extremely rare.  It doesn't seem to me that there's any 
need to use the same namespace speedup mechanism here: capturing 
setattr operations on a type should be sufficient to implement 
invalidation, without mucking about with dictionary entries.  An 
ordinary dict should suffice.

Of course, I suppose there are use cases where somebody uses a class 
attribute as a "global" of sorts, and those use cases would be slowed 
down.  However, if you want to use the entry caching approach, you 
wouldn't need to worry about the segfault case.  (Since somebody 
would have to use C to get at the "real" dictionary.)

More information about the Python-Dev mailing list