[Python-Dev] Non-string keys in namespace dicts
Neil Toronto
ntoronto at cs.byu.edu
Tue Dec 4 08:08:53 CET 2007
Phillip J. Eby wrote:
> At 10:17 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Interesting. But I'm going to have to say it probably wouldn't work as
>> well, since C code can and does alter tp_dict directly. Those places in
>> the core would have to be altered to invalidate the cache.
>
> Eh? Where is the type dictionary altered outside of setattr and class
> creation?
You're right - my initial grep turned up stuff that looked like tp_dict
monkeying out of context. The ctypes module does it a lot, but only in
its various *_new functions.
>> It'd also be really annoying for a class to
>> have to notify all of its subclasses when one of its attributes changed.
>
> It's not all subclasses - only those subclasses that don't shadow the
> attribute. Also, it's not necessarily the case that notification would
> be O(subclasses) - it could be done via a version counter, as in your
> approach. Admittedly, that would require an extra bit of indirection,
> since you'd need to keep (and check) counters for each descriptor.
And the extra overhead comes back to bite us again, and probably in a
critical path. (I'm sure you've been bitten in a critical path before.)
That's been the issue with all of these caching schemes so far - Python
is just too durned dynamic to guarantee them anything they can exploit
for efficiency, so they end up slowing down common operations. (Not that
I'd change a bit of Python, mind you.)
For example, almost everything I've tried slows down attribute lookups
on built-in types. Adding one 64-bit version counter check and a branch
on failure incurs a 3-5% penalty. That's not the end of the world, but
it makes pybench take about 0.65% longer.
I finally overcame that by making a custom dictionary type to use as the
cache. I haven't yet tested something my caching lookups are slower at -
they're all faster so far for builtins and Python objects with any size
MRO - but I haven't tested exhaustively and I haven't done failing
hasattr-style lookups. Turns out that not finding an attribute all the
way up the MRO (which can lead to a persistent cache miss if done with
the same name) is rather frequent in Python and is expected to be fast.
I can cache missing attributes as easily as present attributes, but they
could pile up if someone decides to hasattr an object with a zillion
different names.
I have a cunning plan, though, which is probably best explained using a
patch.
At any rate, I'm warming to this setattr idea, and I'll likely try that
next whether my current approach works out or not.
Neil
More information about the Python-Dev
mailing list