[Python-ideas] Fast global cacheless lookup
Guido van Rossum
guido at python.org
Thu Nov 22 16:46:21 CET 2007
Cool! Are you willing to show the code yet (bugs and all)?
Personally, I'm not sure that it's worth doing this for STORE_GLOBAL
(which should be rarely used in properly written code).
Some questions:
- what's the space & time impact for a dict with no watchers?
- does this do anything for builtins?
- could this be made to work for instance variables?
- what about exec(src, ns) where ns is a mapping but not a dict?
--Guido
On Nov 22, 2007 7:40 AM, Neil Toronto <ntoronto at cs.byu.edu> wrote:
> I have a hack coded up against r59068 in which LOAD_GLOBAL is even
> faster than LOAD_FAST. It'll be the same with STORE_GLOBAL and the
> *_NAME opcodes after I'm done with it, and it should be fully
> transparent to Python code. (That is, you can go ahead and swap out
> __builtins__ and crazy junk like that and everything should work as it
> did before.) Regression tests all pass, except test_gc on functions -
> I've got a refcount bug somewhere.
>
> Here's the microbenchmark I've been using to test LOAD_GLOBAL and LOAD_FAST:
>
> import timeit
> import dis
>
> def test_local_get():
> x = 0
> x; x; x; #... and 397 more of them
>
> if __name__ == '__main__':
> print dis.dis(test_local_get.func_code)
> print timeit.Timer('test_local_get()',
> 'from locals_test import test_local_get').timeit()
>
>
> The globals test puts 'x' in module scope, and the builtins test changes
> 'x' to 'len' and doesn't assign it to 0.
>
> Output right now:
>
> r59068 locals: 15.57 sec
> myhack locals: 15.61 sec (increase is probably insignificant or random)
>
> r59068 globals: 23.61 sec
> myhack globals: 15.14 sec (!)
>
> r59068 builtins: 28.08 sec
> myhack builtins: 15.26 sec (!!)
>
> Of course, it's no good if it slows everything else way the heck down.
> So 10 rounds of pybench says:
>
> r59068: mean 8.92, std 0.05
> myhack: mean 8.99, std 0.04
>
> From what I see in pybench, globals access is severely underrepresented
> compared to real programs, so those numbers aren't representative of the
> possible difference in real-life performance.
>
> Jim Jewett gave me the idea here:
>
> http://mail.python.org/pipermail/python-ideas/2007-November/001207.html
>
> "Note that weakening the module.__dict__ promise to only meeting the
> dict API would make it easier to implement the various speed-up-globals
> suggestions."
>
> I didn't exactly do that, but it did get me thinking. The other
> proposals for speeding up globals access seemed to do their darndest to
> leave PyDictObject alone and ended up hideously complicated because of
> it. Here's the main idea for this one: What if a frame could maintain an
> array of pointers right into a dictionary's entry table? A global lookup
> would then consist of a couple of pointer dereferences, and any value
> change would show up immediately to the frame.
>
> There was a dangerous dangling pointer problem inherent in that, so I
> formalized an update mechanism using an observer pattern.
>
> Here's how it works. Arbitrary objects can register themselves with a
> dictionary as "entry observers". The dictionary keeps track of all the
> registered observers, and for certain events, makes a call to each one
> to tell them that something has changed. The entry observers get
> pointers to entries via PyDict_GetEntry, which is just like
> PyDict_GetItem, except it returns a PyDictEntry * right from the
> dictionary's entry table.
>
> The dict notifies its observers on delitem, pop, popitem, resize and
> clear. Nothing else is necessary - nothing else will change the address
> of or invalidate an entry. There are very, very few changes in
> PyDictObject. In the general case, the pointer to the list of observers
> is NULL, and the only additional slowdown is when delitem, pop, popitem,
> resize and clear check that and move on - but those aren't called often.
>
> So get, set, iter, contains, etc., are all exactly as fast as they were
> before. The biggest performance hit is when a highly-observed dict like
> __builtin__.__dict__ resizes, but that's rare enough to not worry about.
>
> To speed up globals access, an auxiliary object to functions and frames
> registers itself as an observer to func_globals and __builtins__. It
> makes an array of PyDictEntry pointers corresponding to
> func_code.co_names. PyEval_EvalFrameEx indexes that array first for
> global values, and updates it if there's one it couldn't find when the
> function was created.
>
> That's pretty much it. There are corner cases I still have to address,
> like what happens if someone replaces or deletes __builtins__, but it
> should be fairly easy to monitor that.
>
> I'd love to hear your comments, everyone. I've glossed over a lot of
> implementation details, but I've tried to make the main ideas clear.
>
> Neil
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-ideas
mailing list