Proposed PEP: Optimizing Global Variable and Attribute Access

Here's a first stab at a PEP for optimizing global variable and attribute access. Barry, when you get a chance, could you assign it a number and let me know about any PEP-related formatting mistakes I made? (It would be nice if PEP 1 referenced a piece of boilerplate people could use. While the PEP Style section lists the headers, I wasn't sure what the Version and Last-Modified headers were actually supposed to look like. Also, it wasn't clear what I was supposed to put in the Post-History: header.) Skip ------------------------------------------------------------------------------ PEP: Title: Optimizing Global Variable and Attribute Access Version: $Revision:$ Author: skip@pobox.com (Skip Montanaro) Status: Draft Type: Standards Track Python-Version: 2.3 Created: 13-Aug-2001 Post-History: Abstract The bindings for most global variables and attributes of other modules typically never change during the execution of a Python program, but because of Python's dynamic nature, code which accesses such global objects must run through a full lookup each time the object is needed. This PEP proposes a mechanism that allows code that accesses most global objects to treat them as local objects and places the burden of updating references on the code that changes the name bindings of such objects. Introduction Consider the workhorse function sre_compile._compile. It is the internal compilation function for the sre module. It consists almost entirely of a loop over the elements of the pattern being compiled, comparing opcodes with known constant values and appending tokens to an outpu list. Most of the comparisons are with constants imported from the sre_constants module. This means there are lots of LOAD_GLOBAL bytecodes in the compiled output of this module. Just by reading the code it's apparent that the author intended LITERAL, NOT_LITERAL, OPCODES and many other symbols to be constants. Still, each time they are involved in an expression, they must be looked up anew. Most global accesses are actually to objects that are "almost constants". This includes global variables in the current module as well as the attributes of other imported modules. Since they rarely change, it seems reasonable to place the burden of updating references to such objects on the code that changes the name bindings. If sre_constants.LITERAL is changed to refer to another object, perhaps it would be worthwhile for the code that modifies the sre_constants module dict to correct any active references to that object. By doing so, in many cases global variables and the attributes of many objects could be cached as local variables. If the bindings between the names given to the objects and the objects themselves changes rarely, the cost of keeping track of such objects should be low and the potential payoff fairly large. Proposed Change I propose that the Python virtual machine be modified to include TRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT would associate a global name or attribute of a global name with a slot in the local variable array and perform an initial lookup of the associated object to fill in the slot with a valid value. The association it creates would be noted by the code responsible for changing the name-to-object binding to cause the associated local variable to be updated. The UNTRACK_OBJECT opcode would delete any association between the name and the local variable slot. Rationale Global variables and attributes rarely change. For example, once a function imports the math module, the binding between the name "math" and the module it refers to aren't likely to change. Similarly, if the function that uses the math module refers to its "sin" attribute, it's unlikely to change. Still, every time the module wants to call the math.sin function, it must first execute a pair of instructions: LOAD_GLOBAL math LOAD_ATTR sin If the client module always assumed that math.sin was a local constant and it was the responsibility of "external forces" outside the function to keep the reference correct, we might have code like this: TRACK_OBJECT math.sin ... LOAD_FAST math.sin ... UNTRACK_OBJECT math.sin If the LOAD_FAST was in a loop the payoff in reduced global loads and attribute lookups could be significant. This technique could, in theory, be applied to any global variable access or attribute lookup. Consider this code: l = [] for i in range(10): l.append(math.sin(i)) return l Even though l is a local variable, you still pay the cost of loading l.append ten times in the loop. The compiler (or an optimizer) could recognize that both math.sin and l.append are being called in the loop and decide to generate the tracked local code, avoiding it for the builtin range() function because it's only called once during loop setup. According to a post to python-dev by Marc-Andre Lemburg [1], LOAD_GLOBAL opcodes account for over 7% of all instructions executed by the Python virtual machine. This can be a very expensive instruction, at least relative to a LOAD_FAST instruction, which is a simple array index and requires no extra function calls by the virtual machine. I believe many LOAD_GLOBAL instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be converted to LOAD_FAST instructions. Code that uses global variables heavily often resorts to various tricks to avoid global variable and attribute lookup. The aforementioned sre_compile._compile function caches the append method of the growing output list. Many people commonly abuse functions' default argument feature to cache global variable lookups. Both of these schemes are hackish and rarely address all the available opportunities for optimization. (For example, sre_compile._compile does not cache the two globals that it uses most frequently: the builtin len function and the global OPCODES array that it imports from sre_constants.py. Discussion Jeremy Hylton has an alternate proposal on the table [2]. His proposal seeks to create a hybrid dictionary/list object for use in global name lookups that would make global variable access look more like local variable access. While there is no C code available to examine, the Python implementation given in his proposal still appears to require dictionary key lookup. It doesn't appear that his proposal could speed local variable attribute lookup, which might be worthwhile in some situations. Backwards Compatibility I don't believe there will be any serious issues of backward compatibility. Obviously, Python bytecode that contains TRACK_OBJECT opcodes could not be executed by earlier versions of the interpreter, but breakage at the bytecode level is often assumed between versions. Implementation TBD. This is where I need help. I believe there should be either a central name/location registry or the code that modifies object attributes should be modified, but I'm not sure the best way to go about this. If you look at the code that implements the STORE_GLOBAL and STORE_ATTR opcodes, it seems likely that some changes will be required to PyDict_SetItem and PyObject_SetAttr or their String variants. Ideally, there'd be a fairly central place to localize these changes. If you begin considering tracking attributes of local variables you get into issues of modifying STORE_FAST as well, which could be a problem, since the name bindings for local variables are changed much more frequently. (I think an optimizer could avoid inserting the tracking code for the attributes for any local variables where the variable's name binding changes.) Performance I believe (though I have no code to prove it at this point), that implementing TRACK_OBJECT will generally not be much more expensive than a single LOAD_GLOBAL instruction or a LOAD_GLOBAL/LOAD_ATTR pair. An optimizer should be able to avoid converting LOAD_GLOBAL and LOAD_GLOBAL/LOAD_ATTR to the new scheme unless the object access occurred within a loop. Further down the line, a register-oriented replacement for the current Python virtual machine [3] could conceivably eliminate most of the LOAD_FAST instructions as well. The number of tracked objects should be relatively small. All active frames of all active threads could conceivably be tracking objects, but this seems small compared to the number of functions defined in a given application. References [1] http://mail.python.org/pipermail/python-dev/2000-July/007609.html [2] http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP [3] http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End:

Skip Montanaro wrote:
Hmmm? That doesn't seem quite right to me. That's just a knee-jerk reaction, I don't have any real evidence. As a side note, I'm not sure whether some mention should be made that TRACK_OBJECT specifically does not work any differently from current Python when it comes to threads. That is, if it's a shared variable of any sort, you need a mutex if you're going to perform multiple operations on it. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.

Aahz> Skip Montanaro wrote: >> The number of tracked objects should be relatively small. All active >> frames of all active threads could conceivably be tracking objects, >> but this seems small compared to the number of functions defined in a >> given application. Aahz> Hmmm? That doesn't seem quite right to me. That's just a Aahz> knee-jerk reaction, I don't have any real evidence. Here's my rationale. Perhaps I've goofed somewhere. TRACK_OBJECT and UNTRACK_OBJECT opcodes will only be inserted in the bodies of functions. Functions are only considered active when there is an active (but possibly suspended) eval_frame call on the C stack that is executing that function's byte code. Suppose my module is def a(): print "a" def b(): print "b" a() def c(): print "c" b() At the point where a's print statement is executed, functions a and b are active. Function c is not. This reasoning applies on a per-thread basis. So the number of active functions by my definition is the number of times eval_frame appears on the call stack for all active threads. In most situations that will be far less (barring deep recursion) than the number of functions available to the application. All I guess I'm trying to communicate is that object tracking is a dynamic thing, not a static thing. Aahz> As a side note, I'm not sure whether some mention should be made Aahz> that TRACK_OBJECT specifically does not work any differently from Aahz> current Python when it comes to threads. That is, if it's a Aahz> shared variable of any sort, you need a mutex if you're going to Aahz> perform multiple operations on it. That's true. If a global object is shared among multiple threads, access to it must still be protected. I haven't added anything to the mix in that regard. I'll add a short section on threads. Hmm... What about this (dumb) code? l = [] lock = threading.Lock() ... def fill_l(): for i in range(1000): lock.acquire() l.append(math.sin(i)) lock.release() ... def consume_l(): while 1: lock.acquire() if l: elt = l.pop() lock.release() fiddle(elt) It's not clear from a static analysis of the code what the lock is protecting. (You can't tell at compile-time that threads are even involved can you?) Would or should it affect attempts to track "l.append" or "math.sin" in the fill_l function? If we annotate the code with mythical track_object and untrack_object builtins (I'm not proposing such functions (they can't be implemented with Python's call-by-value semantics), just illustrating where stuff would go in the bytecode), we get l = [] lock = threading.Lock() ... def fill_l(): track_object("l.append", append) track_object("math.sin", sin) for i in range(1000): lock.acquire() append(sin(i)) lock.release() untrack_object("math.sin", sin) untrack_object("l.append", append) ... def consume_l(): while 1: lock.acquire() if l: elt = l.pop() lock.release() fiddle(elt) Is that correct both with and without threads? Skip

[sorry, excessive quoting ahead] Skip Montanaro wrote:
That's true. My point was more that, in my knee-jerk opinion, if one uses, say, math.sin() at all in a module, it will be active in a large number of the callframes. Therefore, the number of tracked objects could easily get quite large. It's a relative thing, of course, and I still think your statistical analysis about cost/benefit is accurate.
Oooooo. Ouch. No, on third thought, it still doesn't matter. If someone is playing games with module globals and not using locks, that code is broken in current Python. We don't need your track() and untrack() functions, but that should be made clear in the PEP, I think. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.

"SM" == Skip Montanaro <skip@pobox.com> writes:
SM> Here's a first stab at a PEP for optimizing global variable SM> and attribute access. Barry, when you get a chance, could you SM> assign it a number and let me know about any PEP-related SM> formatting mistakes I made? Done. It's PEP 266, but I had to shorten the title. It was too long even with the new limit <44 wink> (give 'em an inch... :). Other than that you did a fine job for a virgin PEP author! I made a few other edits, but mostly so I could piss on the hydrant. :) SM> (It would be nice if PEP 1 referenced a piece of boilerplate SM> people could use. While the PEP Style section lists the SM> headers, I wasn't sure what the Version and Last-Modified SM> headers were actually supposed to look like. Also, it wasn't SM> clear what I was supposed to put in the Post-History: header.) Both excellent suggestions. I've added PEP 9, which can be used as a template for both Informational and Standards Track PEPs. I've often found I wanted something I could just copy-and-edit to get started quickly on a new PEP. I've clarified a few things in PEP 1. Thanks! -Barry

SM> Barry, when you get a chance, could you assign it a number and let SM> me know about any PEP-related formatting mistakes I made? BAW> Done. It's PEP 266, but I had to shorten the title. It was too BAW> long even with the new limit <44 wink> Hmmm... Wasn't aware there was a limit. I guess that's why you're the PEP wizard Barry! Thanks, Skip

Skip Montanaro wrote:
Hmmm? That doesn't seem quite right to me. That's just a knee-jerk reaction, I don't have any real evidence. As a side note, I'm not sure whether some mention should be made that TRACK_OBJECT specifically does not work any differently from current Python when it comes to threads. That is, if it's a shared variable of any sort, you need a mutex if you're going to perform multiple operations on it. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.

Aahz> Skip Montanaro wrote: >> The number of tracked objects should be relatively small. All active >> frames of all active threads could conceivably be tracking objects, >> but this seems small compared to the number of functions defined in a >> given application. Aahz> Hmmm? That doesn't seem quite right to me. That's just a Aahz> knee-jerk reaction, I don't have any real evidence. Here's my rationale. Perhaps I've goofed somewhere. TRACK_OBJECT and UNTRACK_OBJECT opcodes will only be inserted in the bodies of functions. Functions are only considered active when there is an active (but possibly suspended) eval_frame call on the C stack that is executing that function's byte code. Suppose my module is def a(): print "a" def b(): print "b" a() def c(): print "c" b() At the point where a's print statement is executed, functions a and b are active. Function c is not. This reasoning applies on a per-thread basis. So the number of active functions by my definition is the number of times eval_frame appears on the call stack for all active threads. In most situations that will be far less (barring deep recursion) than the number of functions available to the application. All I guess I'm trying to communicate is that object tracking is a dynamic thing, not a static thing. Aahz> As a side note, I'm not sure whether some mention should be made Aahz> that TRACK_OBJECT specifically does not work any differently from Aahz> current Python when it comes to threads. That is, if it's a Aahz> shared variable of any sort, you need a mutex if you're going to Aahz> perform multiple operations on it. That's true. If a global object is shared among multiple threads, access to it must still be protected. I haven't added anything to the mix in that regard. I'll add a short section on threads. Hmm... What about this (dumb) code? l = [] lock = threading.Lock() ... def fill_l(): for i in range(1000): lock.acquire() l.append(math.sin(i)) lock.release() ... def consume_l(): while 1: lock.acquire() if l: elt = l.pop() lock.release() fiddle(elt) It's not clear from a static analysis of the code what the lock is protecting. (You can't tell at compile-time that threads are even involved can you?) Would or should it affect attempts to track "l.append" or "math.sin" in the fill_l function? If we annotate the code with mythical track_object and untrack_object builtins (I'm not proposing such functions (they can't be implemented with Python's call-by-value semantics), just illustrating where stuff would go in the bytecode), we get l = [] lock = threading.Lock() ... def fill_l(): track_object("l.append", append) track_object("math.sin", sin) for i in range(1000): lock.acquire() append(sin(i)) lock.release() untrack_object("math.sin", sin) untrack_object("l.append", append) ... def consume_l(): while 1: lock.acquire() if l: elt = l.pop() lock.release() fiddle(elt) Is that correct both with and without threads? Skip

[sorry, excessive quoting ahead] Skip Montanaro wrote:
That's true. My point was more that, in my knee-jerk opinion, if one uses, say, math.sin() at all in a module, it will be active in a large number of the callframes. Therefore, the number of tracked objects could easily get quite large. It's a relative thing, of course, and I still think your statistical analysis about cost/benefit is accurate.
Oooooo. Ouch. No, on third thought, it still doesn't matter. If someone is playing games with module globals and not using locks, that code is broken in current Python. We don't need your track() and untrack() functions, but that should be made clear in the PEP, I think. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.

"SM" == Skip Montanaro <skip@pobox.com> writes:
SM> Here's a first stab at a PEP for optimizing global variable SM> and attribute access. Barry, when you get a chance, could you SM> assign it a number and let me know about any PEP-related SM> formatting mistakes I made? Done. It's PEP 266, but I had to shorten the title. It was too long even with the new limit <44 wink> (give 'em an inch... :). Other than that you did a fine job for a virgin PEP author! I made a few other edits, but mostly so I could piss on the hydrant. :) SM> (It would be nice if PEP 1 referenced a piece of boilerplate SM> people could use. While the PEP Style section lists the SM> headers, I wasn't sure what the Version and Last-Modified SM> headers were actually supposed to look like. Also, it wasn't SM> clear what I was supposed to put in the Post-History: header.) Both excellent suggestions. I've added PEP 9, which can be used as a template for both Informational and Standards Track PEPs. I've often found I wanted something I could just copy-and-edit to get started quickly on a new PEP. I've clarified a few things in PEP 1. Thanks! -Barry

SM> Barry, when you get a chance, could you assign it a number and let SM> me know about any PEP-related formatting mistakes I made? BAW> Done. It's PEP 266, but I had to shorten the title. It was too BAW> long even with the new limit <44 wink> Hmmm... Wasn't aware there was a limit. I guess that's why you're the PEP wizard Barry! Thanks, Skip
participants (3)
-
aahz@rahul.net
-
barry@zope.com
-
Skip Montanaro