[Python-Dev] Proposed PEP: Optimizing Global Variable and Attribute Access
Skip Montanaro
skip@pobox.com (Skip Montanaro)
Tue, 14 Aug 2001 00:57:44 -0500
Here's a first stab at a PEP for optimizing global variable and attribute
access. Barry, when you get a chance, could you assign it a number and let
me know about any PEP-related formatting mistakes I made? (It would be nice
if PEP 1 referenced a piece of boilerplate people could use. While the PEP
Style section lists the headers, I wasn't sure what the Version and
Last-Modified headers were actually supposed to look like. Also, it wasn't
clear what I was supposed to put in the Post-History: header.)
Skip
------------------------------------------------------------------------------
PEP:
Title: Optimizing Global Variable and Attribute Access
Version: $Revision:$
Author: skip@pobox.com (Skip Montanaro)
Status: Draft
Type: Standards Track
Python-Version: 2.3
Created: 13-Aug-2001
Post-History:
Abstract
The bindings for most global variables and attributes of other modules
typically never change during the execution of a Python program, but
because of Python's dynamic nature, code which accesses such global
objects must run through a full lookup each time the object is needed.
This PEP proposes a mechanism that allows code that accesses most global
objects to treat them as local objects and places the burden of updating
references on the code that changes the name bindings of such objects.
Introduction
Consider the workhorse function sre_compile._compile. It is the
internal compilation function for the sre module. It consists almost
entirely of a loop over the elements of the pattern being compiled,
comparing opcodes with known constant values and appending tokens to an
outpu list. Most of the comparisons are with constants imported from
the sre_constants module. This means there are lots of LOAD_GLOBAL
bytecodes in the compiled output of this module. Just by reading the
code it's apparent that the author intended LITERAL, NOT_LITERAL,
OPCODES and many other symbols to be constants. Still, each time they
are involved in an expression, they must be looked up anew.
Most global accesses are actually to objects that are "almost
constants". This includes global variables in the current module as
well as the attributes of other imported modules. Since they rarely
change, it seems reasonable to place the burden of updating references
to such objects on the code that changes the name bindings. If
sre_constants.LITERAL is changed to refer to another object, perhaps it
would be worthwhile for the code that modifies the sre_constants module
dict to correct any active references to that object. By doing so, in
many cases global variables and the attributes of many objects could be
cached as local variables. If the bindings between the names given to
the objects and the objects themselves changes rarely, the cost of
keeping track of such objects should be low and the potential payoff
fairly large.
Proposed Change
I propose that the Python virtual machine be modified to include
TRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT would associate a
global name or attribute of a global name with a slot in the local
variable array and perform an initial lookup of the associated object to
fill in the slot with a valid value. The association it creates would
be noted by the code responsible for changing the name-to-object binding
to cause the associated local variable to be updated. The
UNTRACK_OBJECT opcode would delete any association between the name and
the local variable slot.
Rationale
Global variables and attributes rarely change. For example, once a
function imports the math module, the binding between the name "math"
and the module it refers to aren't likely to change. Similarly, if the
function that uses the math module refers to its "sin" attribute, it's
unlikely to change. Still, every time the module wants to call the
math.sin function, it must first execute a pair of instructions:
LOAD_GLOBAL math
LOAD_ATTR sin
If the client module always assumed that math.sin was a local constant
and it was the responsibility of "external forces" outside the function
to keep the reference correct, we might have code like this:
TRACK_OBJECT math.sin
...
LOAD_FAST math.sin
...
UNTRACK_OBJECT math.sin
If the LOAD_FAST was in a loop the payoff in reduced global loads and
attribute lookups could be significant.
This technique could, in theory, be applied to any global variable
access or attribute lookup. Consider this code:
l = []
for i in range(10):
l.append(math.sin(i))
return l
Even though l is a local variable, you still pay the cost of loading
l.append ten times in the loop. The compiler (or an optimizer) could
recognize that both math.sin and l.append are being called in the loop
and decide to generate the tracked local code, avoiding it for the
builtin range() function because it's only called once during loop
setup.
According to a post to python-dev by Marc-Andre Lemburg [1], LOAD_GLOBAL
opcodes account for over 7% of all instructions executed by the Python
virtual machine. This can be a very expensive instruction, at least
relative to a LOAD_FAST instruction, which is a simple array index and
requires no extra function calls by the virtual machine. I believe many
LOAD_GLOBAL instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be
converted to LOAD_FAST instructions.
Code that uses global variables heavily often resorts to various tricks
to avoid global variable and attribute lookup. The aforementioned
sre_compile._compile function caches the append method of the growing
output list. Many people commonly abuse functions' default argument
feature to cache global variable lookups. Both of these schemes are
hackish and rarely address all the available opportunities for
optimization. (For example, sre_compile._compile does not cache the two
globals that it uses most frequently: the builtin len function and the
global OPCODES array that it imports from sre_constants.py.
Discussion
Jeremy Hylton has an alternate proposal on the table [2]. His proposal
seeks to create a hybrid dictionary/list object for use in global name
lookups that would make global variable access look more like local
variable access. While there is no C code available to examine, the
Python implementation given in his proposal still appears to require
dictionary key lookup. It doesn't appear that his proposal could
speed local variable attribute lookup, which might be worthwhile in some
situations.
Backwards Compatibility
I don't believe there will be any serious issues of backward
compatibility. Obviously, Python bytecode that contains TRACK_OBJECT
opcodes could not be executed by earlier versions of the interpreter,
but breakage at the bytecode level is often assumed between versions.
Implementation
TBD. This is where I need help. I believe there should be either a
central name/location registry or the code that modifies object
attributes should be modified, but I'm not sure the best way to go about
this. If you look at the code that implements the STORE_GLOBAL and
STORE_ATTR opcodes, it seems likely that some changes will be required
to PyDict_SetItem and PyObject_SetAttr or their String variants.
Ideally, there'd be a fairly central place to localize these changes.
If you begin considering tracking attributes of local variables you get
into issues of modifying STORE_FAST as well, which could be a problem,
since the name bindings for local variables are changed much more
frequently. (I think an optimizer could avoid inserting the tracking
code for the attributes for any local variables where the variable's
name binding changes.)
Performance
I believe (though I have no code to prove it at this point), that
implementing TRACK_OBJECT will generally not be much more expensive than
a single LOAD_GLOBAL instruction or a LOAD_GLOBAL/LOAD_ATTR pair. An
optimizer should be able to avoid converting LOAD_GLOBAL and
LOAD_GLOBAL/LOAD_ATTR to the new scheme unless the object access
occurred within a loop. Further down the line, a register-oriented
replacement for the current Python virtual machine [3] could conceivably
eliminate most of the LOAD_FAST instructions as well.
The number of tracked objects should be relatively small. All active
frames of all active threads could conceivably be tracking objects, but
this seems small compared to the number of functions defined in a given
application.
References
[1] http://mail.python.org/pipermail/python-dev/2000-July/007609.html
[2] http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP
[3] http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: