[Python-3000] Armin's attribute lookup caching for 3.0

Fri Dec 7 22:48:19 CET 2007

Phillip J. Eby wrote:
> At 12:14 PM 12/7/2007 -0700, Neil Toronto wrote:
>> I found updating caches from setattr to be faster than invalidating
>> entries.
> 
> Really?  Even on the SpecialClassAttribute test?  I'd have assumed that 
> Armin's invalidation flag mechanism would make repeated sets faster.  Of 
> course, in cases where you read the attribute every time it's set, your 
> approach might come out ahead somewhat.  Armin's approach has to walk 
> the entire subclass tree to mark the versions invalid, whereas yours can 
> skip shadowed attributes.

In Python 3.0, since everything is so much more unified, Armin's 
invalidate everything approach can skip shadowed attributes as well. If 
a subclass shadows an attribute, its cache entry either 1) doesn't 
exist, or 2) caches the shadowing value.

The invalidating callback looks like:

     1. Set type's cache ID to current, increment current

The updating callback looks like:

     1. Get Unicode hash (almost always pointer dereference)
     2. Calculate cache index
     3. If entry cache ID and name match, assign new value

It's not a whole lot more, especially compared to the update_subclasses 
machinery. Where I think the updating approach wins is when only one 
attribute is set at a time, rather than a lot of them in a row, which I 
believe is more likely.

> I suspect that in real programs, though, it's rare to be setting 
> attributes on a base class that are shadowed by subclass attributes.  
> Most likely, you'll either be changing something that's global and 
> inherited everywhere, or something that's on a leaf class to begin 
> with.  Your approach should definitely be a win on the *read* side of 
> things, though, since it doesn't have to check version validity during 
> lookup.
> 
> That having been said, the idea that the statement 'SomeBaseClass.foo = 
> 23' is actually going to walk through cache entries and invoke a 
> callback for *every* subclass of SomeBaseClass in the program makes me a 
> tiny bit nervous.
> 
> On the other hand, I suppose it's also a good argument for not using 
> class attributes when you really want a global.  :)

Heh. You never know what those crazy users will need to do. You of all 
people should know that. :p

If it's too slow, an obvious way to speed it up is to not use 
update_subclasses and avoid the overhead. Besides avoiding calling a 
function by pointer, not doing the shadowing check may also be faster 
generally, since, as you say, assigned attributes are most likely 1) not 
shadowed (they'll almost never be methods), or 2) in a leaf class. It 
may be that Armin's invalidate everything approach would be generally 
faster that way because it *can* skip shadowed attributes. To update you 
have to check for shadowing.

I may try it. I couldn't say whether it's worth duplicating the code.

FWIW and slightly topic-veering, this patch keeps hit/miss counts if you 
want it to. Pybench has a hit rate of 81%. It's hard to say whether a 
benchmark gives a good measure of hit rate, though, since most code 
doesn't repeat the same operations to quite the extent that a benchmark 
does. OTOH, build and build_ext get 99%, so it may be a bad measure in 
the other direction.

Neil