[Python-3000] Armin's attribute lookup caching for 3.0
Neil Toronto
ntoronto at cs.byu.edu
Fri Dec 7 22:48:19 CET 2007
Phillip J. Eby wrote:
> At 12:14 PM 12/7/2007 -0700, Neil Toronto wrote:
>> I found updating caches from setattr to be faster than invalidating
>> entries.
>
> Really? Even on the SpecialClassAttribute test? I'd have assumed that
> Armin's invalidation flag mechanism would make repeated sets faster. Of
> course, in cases where you read the attribute every time it's set, your
> approach might come out ahead somewhat. Armin's approach has to walk
> the entire subclass tree to mark the versions invalid, whereas yours can
> skip shadowed attributes.
In Python 3.0, since everything is so much more unified, Armin's
invalidate everything approach can skip shadowed attributes as well. If
a subclass shadows an attribute, its cache entry either 1) doesn't
exist, or 2) caches the shadowing value.
The invalidating callback looks like:
1. Set type's cache ID to current, increment current
The updating callback looks like:
1. Get Unicode hash (almost always pointer dereference)
2. Calculate cache index
3. If entry cache ID and name match, assign new value
It's not a whole lot more, especially compared to the update_subclasses
machinery. Where I think the updating approach wins is when only one
attribute is set at a time, rather than a lot of them in a row, which I
believe is more likely.
> I suspect that in real programs, though, it's rare to be setting
> attributes on a base class that are shadowed by subclass attributes.
> Most likely, you'll either be changing something that's global and
> inherited everywhere, or something that's on a leaf class to begin
> with. Your approach should definitely be a win on the *read* side of
> things, though, since it doesn't have to check version validity during
> lookup.
>
> That having been said, the idea that the statement 'SomeBaseClass.foo =
> 23' is actually going to walk through cache entries and invoke a
> callback for *every* subclass of SomeBaseClass in the program makes me a
> tiny bit nervous.
>
> On the other hand, I suppose it's also a good argument for not using
> class attributes when you really want a global. :)
Heh. You never know what those crazy users will need to do. You of all
people should know that. :p
If it's too slow, an obvious way to speed it up is to not use
update_subclasses and avoid the overhead. Besides avoiding calling a
function by pointer, not doing the shadowing check may also be faster
generally, since, as you say, assigned attributes are most likely 1) not
shadowed (they'll almost never be methods), or 2) in a leaf class. It
may be that Armin's invalidate everything approach would be generally
faster that way because it *can* skip shadowed attributes. To update you
have to check for shadowing.
I may try it. I couldn't say whether it's worth duplicating the code.
FWIW and slightly topic-veering, this patch keeps hit/miss counts if you
want it to. Pybench has a hit rate of 81%. It's hard to say whether a
benchmark gives a good measure of hit rate, though, since most code
doesn't repeat the same operations to quite the extent that a benchmark
does. OTOH, build and build_ext get 99%, so it may be a bad measure in
the other direction.
Neil
More information about the Python-3000
mailing list