[Patches] [ python-Patches-1616125 ] Cached globals+builtins lookup optimization

SourceForge.net noreply at sourceforge.net
Sat Dec 16 18:13:38 CET 2006

Patches item #1616125, was opened at 2006-12-15 01:44
Message generated for change (Comment added) made by ag6502
You can respond by visiting: 

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Andrea Griffini (ag6502)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cached globals+builtins lookup optimization

Initial Comment:
This patch implements a speed optimization by introducing a timestamp on dictionaries and by caching lookups of constants so that lookup doesn't need to be repeated if the dictionary didn't change.

Currently the patch implements the cached lookup only for the LOAD_GLOBAL opcode and stores the cache as two extra member in the code object. I'm going to investigate if LOAD_ATTR in the special case of a module is worth caching too.

A cache entry for LOAD_GLOBAL is defined by two timestamps (Py_ssize_t) and a *borrowed* pointer to a PyObject. The use of a borrowed pointer is safe because the value is used only if the dictionaries didn't change, so the result of lookup must still be alive because the reference in the dictionary that returned the cached value it is still in place.

The assumptions are:
- a LOAD_GLOBAL opcode always looks for the same
  symbol (i.e. co_names never changes)
- the dictionary being searched (globals and
  builtins) are mostly constant

On my system I got a speedup of 8%-10% on a real program (a tcp/ip server that handles an in-memory database of python objects) on the checkpoint load + rollforward steps (a few millions LOAD_GLOBAL operations in total) and the varying percentage depends on which compiler options are used to build python.

On another little test program that calls a function

def f(x):
    return math.sin(x) + math.cos(x) + abs(x)

one million times the speedup with the default makefile
is about 7%.

To enable the patch the makefile should define the CACHED_LOOKUPS symbol.

The change to dictobject also exports the timestamp, at the python level; other changes are instead invisible at the python level. I'm not sure at all exporting dict.timestamp() is a good idea however...

Note that after applying the patch you may find an error in userdict if you simply use "make" (there's probably some dependency problem somewhere).
After make clean + make the test suite runs fine on my system.

I'm interested in hearing if also other platforms do actually see a real performance improvement.



>Comment By: Andrea Griffini (ag6502)
Date: 2006-12-16 18:13

Logged In: YES 
Originator: YES

I am experimenting with long long timestamps (based on "unsigned long
long" if available or explicitly on two "unsigned long" otherwise). An
alternative to slowing down lookups by using 64 bits on 32-bit computers is
just using a single 32-bit counter and trigger a "cache flush" when the
timestamp rolls over by visiting all live dictionaries and code objects.
This would be IMO a *very* infrequent operation (for example on my
P4/2.4GHz just doing nothing else but dict updates getting to 2**32 updates
would require 14 minutes).


Comment By: Martin v. Löwis (loewis)
Date: 2006-12-16 12:01

Logged In: YES 
Originator: NO

How does that deal with timestamp wraparound?


You can respond by visiting: 

More information about the Patches mailing list