[Python-Dev] optimizing non-local object access
Jeremy Hylton
jeremy@zope.com
Thu, 9 Aug 2001 14:15:24 -0400 (EDT)
>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:
Jeremy> My worry about your approach is that the track-object
Jeremy> opcodes could add a lot of expense for objects only used
Jeremy> once or twice. If the function uses math.sin inside a loop,
Jeremy> it's an obvious win. If it uses it only once, it's not so
Jeremy> clear.
SM> Even if math.sin is used just once you swap a
SM> LOAD_GLOBAL/LOAD_ATTR pair for a
SM> TRACK_OBJECT/LOAD_FAST/UNTRACK_OBJECT trio, so the hit you take
SM> shouldn't be terrible. (My assumption is that the
SM> register/unregister cost is fairly low and the actual
SM> notification/update code will almost never be executed.) You
SM> break even in total instructions executed with two accesses and
SM> win after that.
I'm assuming some kind of memory allocation is necessary to accomodate
an aribtrary number of handlers. If you need to resize an array that
holds the pointers to the tracking callbacks, it could get expensive.
I also wonder if you have to pay for tracking changes whenever the
name is rebound in the module, or just when you need to use the name
again.
SM> In addition, this might be a strategy left for
SM> an optimization pass that would only make the change if the
SM> LOAD_ATTR and/or LOAD_GLOBAL instructions are executed in a
SM> loop.
Good point.
Jeremy> To be more concrete: The math module would store the sin
Jeremy> name in slot X. The first time the foobar module used
Jeremy> math.sin it would lookup the slot of sin in the math table.
Jeremy> The foobar module would store a pointer to math's fast
Jeremy> globals and the index of the sin slot. Then math.sin would
Jeremy> be accessed via a single opcode that used the stored
Jeremy> information.
SM> Unfortunately, the code that uses math.sin can't know that math
SM> is a module. It might be an instance with a sin attribute.
SM> Even worse, because of Python's dynamic nature, what the name
SM> "math" is bound to can change. You can't assume it will always
SM> be bound to a module object, even if it is the first time you
SM> set things up. I think you have to work with names and name
SM> bindings. I don't think you can make assumptions about what the
SM> names are bound to.
No assumptions necessary. The compiler only emits the new opcodes for
names bound by import or attributes thereof. If the module name
('math') is rebound, the interpreter is responsible for reseting all
of the other bindings that depend on it ('math.sin'). If the object
'math', isn't a module (even though the compiler guessed it would be),
the opcodes fall back to the old implementation.
The first time math.sin is used, we do the following:
- check if the math.sin fast globals entry is initialized
(it won't be, but it could be marked uninitialized or
"don't use")
- check that math is indeed a module
- lookup the sin slot in math
- record the slot in the fast globals table
On future uses, the first step above will discover a valid binding.
If the name math is rebound, the interpreter marks as the fast globlas
refering to it as unitinitalized.
One advantage of this approach is that the work is shared across all
code in a module. If many functions use math.sin, the first one
initializes the table and all the rest use it.
SM> The handwaving bit in my post was there because I am not
SM> familiar enough with the various possibilities for name
SM> rebinding. Does it all boil down to PyDict_SetItem or
SM> PyObject_SetAttr as I suspect? Are those functions too
SM> low-level, that is, have the names been forgetten completely at
SM> that point? If so, perhaps STORE_GLOBAL and STORE_ATTR would
SM> have to be modified to use PyDict_SetItemString and
SM> PyObject_SetAttrString instead.
I think we hook in at the tp_getattr(o) level. Module objects can
detect rebindings there and do whatever bookkeeping is necessary to
keep references to its name consistent. I think this is the right
approach for either technique we're discussing.
Jeremy