Inspired by the second half of Jeremy's talk on DevDay, here's my alternative approach for speeding up instance attribute access. Like my idea for globals, it uses double indirection rather than recompilation.
- We only care about attributes of 'self' (which is identified as the first argument of a method, not by name). We can exclude functions from our analysis that make any assignment to self -- this is extremely rare and would throw off our analysis. We should also exclude static methods and class methods, since their first argument doesn't have the same role.
- Static analysis of the source code of a class (without access to the base class) can determine attributes of the class, and to some extent instance variables. Without also analyzing the base classes, this analysis cannot reliably distinguish between instance variables and methods inherited from a base class; it can distinguish between instance variables and methods defined in the current class.
- We can guess the status of un-assigned-to inherited attributes by seeing whether they are called or not. This is not 100% accurate, so we need things to work (if slower) even when we guess wrong.
- For instance variable references and stores of the form self.<name>, the bytecode compiler emits opcodes LOAD_SELF_IVAR <i> and STORE_SELF_IVAR <i>, where <i> is a small int identifying the instance variable (ivar). A particular ivar is identified by the same <i> throughout all methods defined in the same class statement, but there is no attempt to coordinate this across different classes related by inheritance.
- It would be nice if we also had a single-opcode way to express a method call on self, e.g. CALL_SELF_METHOD <i>, <n>, <k> where <i> identifies the method like above, and <n> and <k> are the number of positional and keyword arguments. Or maybe we should just have LOAD_SELF_METHOD <i> which may be able to skip looking in the instance dict.
- Some data structure describing the mapping from <i> to attribute name, and whether it's an ivar or a method, is produced by the compiler and stored in the class __dict__. The function objects representing methods also contain a pointer to this data structure. (Or the code objects? But it needs to be shared. Details, details.)
- When a class object is created (at run-time), another data structure is created that accumulates the <i>-to-name mappings from that class and all its base classes.