[Python-Dev] Non-string keys in namespace dicts

Neil Toronto ntoronto at cs.byu.edu
Tue Dec 4 08:08:53 CET 2007


Phillip J. Eby wrote:
> At 10:17 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Interesting. But I'm going to have to say it probably wouldn't work as
>> well, since C code can and does alter tp_dict directly. Those places in
>> the core would have to be altered to invalidate the cache.
> 
> Eh?  Where is the type dictionary altered outside of setattr and class 
> creation?

You're right - my initial grep turned up stuff that looked like tp_dict 
monkeying out of context. The ctypes module does it a lot, but only in 
its various *_new functions.

>> It'd also be really annoying for a class to
>> have to notify all of its subclasses when one of its attributes changed.
> 
> It's not all subclasses - only those subclasses that don't shadow the 
> attribute.  Also, it's not necessarily the case that notification would 
> be O(subclasses) - it could be done via a version counter, as in your 
> approach.  Admittedly, that would require an extra bit of indirection, 
> since you'd need to keep (and check) counters for each descriptor.

And the extra overhead comes back to bite us again, and probably in a 
critical path. (I'm sure you've been bitten in a critical path before.) 
That's been the issue with all of these caching schemes so far - Python 
is just too durned dynamic to guarantee them anything they can exploit 
for efficiency, so they end up slowing down common operations. (Not that 
I'd change a bit of Python, mind you.)

For example, almost everything I've tried slows down attribute lookups 
on built-in types. Adding one 64-bit version counter check and a branch 
on failure incurs a 3-5% penalty. That's not the end of the world, but 
it makes pybench take about 0.65% longer.

I finally overcame that by making a custom dictionary type to use as the 
cache. I haven't yet tested something my caching lookups are slower at - 
they're all faster so far for builtins and Python objects with any size 
MRO - but I haven't tested exhaustively and I haven't done failing 
hasattr-style lookups. Turns out that not finding an attribute all the 
way up the MRO (which can lead to a persistent cache miss if done with 
the same name) is rather frequent in Python and is expected to be fast. 
I can cache missing attributes as easily as present attributes, but they 
could pile up if someone decides to hasattr an object with a zillion 
different names.

I have a cunning plan, though, which is probably best explained using a 
patch.

At any rate, I'm warming to this setattr idea, and I'll likely try that 
next whether my current approach works out or not.

Neil


More information about the Python-Dev mailing list