[Python-Dev] Can LOAD_GLOBAL be optimized to a simple array lookup?

K.S.Sreeram sreeram at tachyontech.net
Thu Aug 24 12:58:33 CEST 2006

Alex Martelli wrote:
> On Aug 23, 2006, at 2:22 PM, K.S.Sreeram wrote:
>> Hi all,
>> I noticed in Python/ceval.c that LOAD_GLOBAL uses a dictionary lookup,
>> and was wondering if that can be optimized to a simple array lookup.
>> If i'm right there are 3 kinds of name lookups: locals, outer
>> scopes(closures), and globals. (not counting attribute lookup). Locals
>> are identified by, either the presence of assignments, or their presence
>> in the arg list. So all name lookups can be classified into the 3 types
>> at compile/load time.
>> Since we know, at load time, which names are global.. Can't we simply
>> build a global name table and replace LOAD_GLOBALs with a lookup at the
>> corresponding index into the global name table?
> At the time the function's body gets compiled, the global (or builtin)
> it's trying to access might or might not be there -- as long as it gets
> added afterwards, before the function's body gets _executed_, no problem
> (in today's scheme).  It's not obvious to me how you could compile a
> ``corresponding index'' into the LOAD_GLOBAL opcode, since that index is
> in general unknown at compile time.

How about this approach:
(disclaimer: this is just a rough sketch!)

At compile time:
- When compiling the module, a list of global names is built. This list
is populated in the order in which the names appear in the module.
References to the global will be replaced by references to the index in
this list.

- Instead of generating LOAD_GLOBAL and STORE_GLOBAL opcodes, we
generate LOAD_GLOBAL_BY_INDEX and STORE_GLOBAL_BY_INDEX, which are new
opcodes. The indexes used with the opcode will be the index of the name
from the list.

- This list of names should be stored in the pyc file so that the
ordering is remembered.

At load time:
- At module load time, we build an array of PyObject*, which is used to
store the values for the globals. The values will be NULL, until the
actual binding is created.

- For each name in the global_name_list in the pyc file, we pass the
pointer to the corresponding entry in the global table, to the dict.
    PyDict_SetLocationForName( module_dict, name, PyObject
**ptr_to_entry_table )
  This is done so that 'get' and 'set' of these names in the dict will
update/fetch from the global table directly.

This approach should result in faster access to globals defined within
the module. It will still work if we try to shadow any builtins. As far
as i see, this doesn't seem to change any existing semantics either.

This does not result in faster access to existing builtins. But a
similar scheme can be worked out for that too.

>> The module's dict object will need to be special so that whenever a name
>> gets 'set', the global name table should get updated.
> It seems that you'd need to chase down and modify all of the LOAD_GLOBAL
> opcodes too, at every such modification.   (the concept of modifying
> builtins becomes extremely scary...).  Considering the amortized speed
> of a dict lookup for an interned string (hash value cached and thus
> immediately available, equality comparison with other interned string a
> single machine-level operation), it's not clear to me that the huge
> complexity (and potential performance impact) of all this could ever
> possibly be justified.
> A change in Python semantics allowing some level of "nailing down" of
> builtins (and perhaps globals too) *COULD* easily yield large
> performance benefits, but that's a subject for the Python-3000 mailing
> list; as long as builtins and globals stay as fluid as today, I'm
> skeptical on the optimization opportunities here.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://mail.python.org/pipermail/python-dev/attachments/20060824/c3c8699e/attachment.pgp 

More information about the Python-Dev mailing list