Fast global cacheless lookup
![](https://secure.gravatar.com/avatar/a051fe145baf8b841107a9246efbc2dc.jpg?s=120&d=mm&r=g)
I have a hack coded up against r59068 in which LOAD_GLOBAL is even faster than LOAD_FAST. It'll be the same with STORE_GLOBAL and the *_NAME opcodes after I'm done with it, and it should be fully transparent to Python code. (That is, you can go ahead and swap out __builtins__ and crazy junk like that and everything should work as it did before.) Regression tests all pass, except test_gc on functions - I've got a refcount bug somewhere. Here's the microbenchmark I've been using to test LOAD_GLOBAL and LOAD_FAST: import timeit import dis def test_local_get(): x = 0 x; x; x; #... and 397 more of them if __name__ == '__main__': print dis.dis(test_local_get.func_code) print timeit.Timer('test_local_get()', 'from locals_test import test_local_get').timeit() The globals test puts 'x' in module scope, and the builtins test changes 'x' to 'len' and doesn't assign it to 0. Output right now: r59068 locals: 15.57 sec myhack locals: 15.61 sec (increase is probably insignificant or random) r59068 globals: 23.61 sec myhack globals: 15.14 sec (!) r59068 builtins: 28.08 sec myhack builtins: 15.26 sec (!!) Of course, it's no good if it slows everything else way the heck down. So 10 rounds of pybench says: r59068: mean 8.92, std 0.05 myhack: mean 8.99, std 0.04 From what I see in pybench, globals access is severely underrepresented compared to real programs, so those numbers aren't representative of the possible difference in real-life performance. Jim Jewett gave me the idea here: http://mail.python.org/pipermail/python-ideas/2007-November/001207.html "Note that weakening the module.__dict__ promise to only meeting the dict API would make it easier to implement the various speed-up-globals suggestions." I didn't exactly do that, but it did get me thinking. The other proposals for speeding up globals access seemed to do their darndest to leave PyDictObject alone and ended up hideously complicated because of it. Here's the main idea for this one: What if a frame could maintain an array of pointers right into a dictionary's entry table? A global lookup would then consist of a couple of pointer dereferences, and any value change would show up immediately to the frame. There was a dangerous dangling pointer problem inherent in that, so I formalized an update mechanism using an observer pattern. Here's how it works. Arbitrary objects can register themselves with a dictionary as "entry observers". The dictionary keeps track of all the registered observers, and for certain events, makes a call to each one to tell them that something has changed. The entry observers get pointers to entries via PyDict_GetEntry, which is just like PyDict_GetItem, except it returns a PyDictEntry * right from the dictionary's entry table. The dict notifies its observers on delitem, pop, popitem, resize and clear. Nothing else is necessary - nothing else will change the address of or invalidate an entry. There are very, very few changes in PyDictObject. In the general case, the pointer to the list of observers is NULL, and the only additional slowdown is when delitem, pop, popitem, resize and clear check that and move on - but those aren't called often. So get, set, iter, contains, etc., are all exactly as fast as they were before. The biggest performance hit is when a highly-observed dict like __builtin__.__dict__ resizes, but that's rare enough to not worry about. To speed up globals access, an auxiliary object to functions and frames registers itself as an observer to func_globals and __builtins__. It makes an array of PyDictEntry pointers corresponding to func_code.co_names. PyEval_EvalFrameEx indexes that array first for global values, and updates it if there's one it couldn't find when the function was created. That's pretty much it. There are corner cases I still have to address, like what happens if someone replaces or deletes __builtins__, but it should be fairly easy to monitor that. I'd love to hear your comments, everyone. I've glossed over a lot of implementation details, but I've tried to make the main ideas clear. Neil
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
Cool! Are you willing to show the code yet (bugs and all)? Personally, I'm not sure that it's worth doing this for STORE_GLOBAL (which should be rarely used in properly written code). Some questions: - what's the space & time impact for a dict with no watchers? - does this do anything for builtins? - could this be made to work for instance variables? - what about exec(src, ns) where ns is a mapping but not a dict? --Guido On Nov 22, 2007 7:40 AM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
I have a hack coded up against r59068 in which LOAD_GLOBAL is even faster than LOAD_FAST. It'll be the same with STORE_GLOBAL and the *_NAME opcodes after I'm done with it, and it should be fully transparent to Python code. (That is, you can go ahead and swap out __builtins__ and crazy junk like that and everything should work as it did before.) Regression tests all pass, except test_gc on functions - I've got a refcount bug somewhere.
Here's the microbenchmark I've been using to test LOAD_GLOBAL and LOAD_FAST:
import timeit import dis
def test_local_get(): x = 0 x; x; x; #... and 397 more of them
if __name__ == '__main__': print dis.dis(test_local_get.func_code) print timeit.Timer('test_local_get()', 'from locals_test import test_local_get').timeit()
The globals test puts 'x' in module scope, and the builtins test changes 'x' to 'len' and doesn't assign it to 0.
Output right now:
r59068 locals: 15.57 sec myhack locals: 15.61 sec (increase is probably insignificant or random)
r59068 globals: 23.61 sec myhack globals: 15.14 sec (!)
r59068 builtins: 28.08 sec myhack builtins: 15.26 sec (!!)
Of course, it's no good if it slows everything else way the heck down. So 10 rounds of pybench says:
r59068: mean 8.92, std 0.05 myhack: mean 8.99, std 0.04
From what I see in pybench, globals access is severely underrepresented compared to real programs, so those numbers aren't representative of the possible difference in real-life performance.
Jim Jewett gave me the idea here:
http://mail.python.org/pipermail/python-ideas/2007-November/001207.html
"Note that weakening the module.__dict__ promise to only meeting the dict API would make it easier to implement the various speed-up-globals suggestions."
I didn't exactly do that, but it did get me thinking. The other proposals for speeding up globals access seemed to do their darndest to leave PyDictObject alone and ended up hideously complicated because of it. Here's the main idea for this one: What if a frame could maintain an array of pointers right into a dictionary's entry table? A global lookup would then consist of a couple of pointer dereferences, and any value change would show up immediately to the frame.
There was a dangerous dangling pointer problem inherent in that, so I formalized an update mechanism using an observer pattern.
Here's how it works. Arbitrary objects can register themselves with a dictionary as "entry observers". The dictionary keeps track of all the registered observers, and for certain events, makes a call to each one to tell them that something has changed. The entry observers get pointers to entries via PyDict_GetEntry, which is just like PyDict_GetItem, except it returns a PyDictEntry * right from the dictionary's entry table.
The dict notifies its observers on delitem, pop, popitem, resize and clear. Nothing else is necessary - nothing else will change the address of or invalidate an entry. There are very, very few changes in PyDictObject. In the general case, the pointer to the list of observers is NULL, and the only additional slowdown is when delitem, pop, popitem, resize and clear check that and move on - but those aren't called often.
So get, set, iter, contains, etc., are all exactly as fast as they were before. The biggest performance hit is when a highly-observed dict like __builtin__.__dict__ resizes, but that's rare enough to not worry about.
To speed up globals access, an auxiliary object to functions and frames registers itself as an observer to func_globals and __builtins__. It makes an array of PyDictEntry pointers corresponding to func_code.co_names. PyEval_EvalFrameEx indexes that array first for global values, and updates it if there's one it couldn't find when the function was created.
That's pretty much it. There are corner cases I still have to address, like what happens if someone replaces or deletes __builtins__, but it should be fairly easy to monitor that.
I'd love to hear your comments, everyone. I've glossed over a lot of implementation details, but I've tried to make the main ideas clear.
Neil _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
![](https://secure.gravatar.com/avatar/a051fe145baf8b841107a9246efbc2dc.jpg?s=120&d=mm&r=g)
Guido van Rossum wrote:
Cool! Are you willing to show the code yet (bugs and all)?
Sure! I stayed up all night doing it and today is Thanksgiving, so I'll probably not get to it for a little while. (I know making a patch shouldn't take long, but I've never done it before.) Should I post the patch here or somewhere else?
Some questions:
- what's the space & time impact for a dict with no watchers?
I think it's almost negligible. Space: There are four bytes extra on every dict for a pointer to the observer list. It may actually be zero or eight or more depending on alignment and malloc block size - I haven't looked. Time: On dicts with no observers, dealloc, delitem, pop, popitem, clear, and resize pass through an "if (mp->ma_entryobs_list != NULL)". PyDict_New sets mp->ma_entryobs_list to NULL. Nothing else is affected.
- does this do anything for builtins?
It does right now well enough to get them quickly, but setting or deleting them elsewhere won't show up yet in the frame. And it doesn't handle the case where __builtins__ is replaced. That'll take a little doing, but just mentally - it shouldn't affect performance much. Anyway, that part will work properly when I'm done.
- could this be made to work for instance variables?
If my brain were thinking in straight lines, maybe I'd come up with something. :) I've got this fuzzy idea that it just might work. The hard part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR applied to something else. Hmm... Something to digest while I'm digesting the Real Other White Meat. :)
- what about exec(src, ns) where ns is a mapping but not a dict?
Good question - I don't know, but I think it should work, at least as well as it did before. If there's no observer attached to a frame, it'll default to its previous behavior. Another hmm... Thanks for the prompt reply. Neil P.S. By the way, I'm very pleased with how clean and workable the codebase is. I actually cheered at the lack of warnings. My wife probably thinks I'm nuts. :D
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
[Quick] The best way to post a patch is to put it in the bug tracker at bugs.python.org, and post a link to the issue here. The best way to create a patch is svn diff, assuming you started out with an anonymous svn checkout (see python.org/dev/) and not just with a distro download. Looking forward to it! --Guido On Nov 22, 2007 8:42 AM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
Guido van Rossum wrote:
Cool! Are you willing to show the code yet (bugs and all)?
Sure! I stayed up all night doing it and today is Thanksgiving, so I'll probably not get to it for a little while. (I know making a patch shouldn't take long, but I've never done it before.) Should I post the patch here or somewhere else?
Some questions:
- what's the space & time impact for a dict with no watchers?
I think it's almost negligible.
Space: There are four bytes extra on every dict for a pointer to the observer list. It may actually be zero or eight or more depending on alignment and malloc block size - I haven't looked.
Time: On dicts with no observers, dealloc, delitem, pop, popitem, clear, and resize pass through an "if (mp->ma_entryobs_list != NULL)". PyDict_New sets mp->ma_entryobs_list to NULL. Nothing else is affected.
- does this do anything for builtins?
It does right now well enough to get them quickly, but setting or deleting them elsewhere won't show up yet in the frame. And it doesn't handle the case where __builtins__ is replaced. That'll take a little doing, but just mentally - it shouldn't affect performance much.
Anyway, that part will work properly when I'm done.
- could this be made to work for instance variables?
If my brain were thinking in straight lines, maybe I'd come up with something. :) I've got this fuzzy idea that it just might work. The hard part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR applied to something else. Hmm...
Something to digest while I'm digesting the Real Other White Meat. :)
- what about exec(src, ns) where ns is a mapping but not a dict?
Good question - I don't know, but I think it should work, at least as well as it did before. If there's no observer attached to a frame, it'll default to its previous behavior. Another hmm...
Thanks for the prompt reply.
Neil
P.S. By the way, I'm very pleased with how clean and workable the codebase is. I actually cheered at the lack of warnings. My wife probably thinks I'm nuts. :D
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
![](https://secure.gravatar.com/avatar/a051fe145baf8b841107a9246efbc2dc.jpg?s=120&d=mm&r=g)
Guido van Rossum wrote:
[Quick] The best way to post a patch is to put it in the bug tracker at bugs.python.org, and post a link to the issue here. The best way to create a patch is svn diff, assuming you started out with an anonymous svn checkout (see python.org/dev/) and not just with a distro download.
Figures. I couldn't get it through svn because my university has a transparent proxy that doesn't like REPORT requests. Is there any chance of getting https enabled at svn.python.org sometime so I don't have to stick to snapshots? Anyway, I'll get to this sometime after I get to bed.
Looking forward to it!
Yes sir! :) Neil
![](https://secure.gravatar.com/avatar/50be2a603b688a28ec6a16710a9a1e9b.jpg?s=120&d=mm&r=g)
2007/11/22, Neil Toronto <ntoronto@cs.byu.edu>:
Figures. I couldn't get it through svn because my university has a transparent proxy that doesn't like REPORT requests. Is there any chance of getting https enabled at svn.python.org sometime so I don't have to stick to snapshots?
I have the same problem in the virtualized Ubuntu at work, but... 1. I create a dynamic tunnel with SSH to the machine at home. 2. Execute svn with tsocks, actually sending the SVN traffic through the tunnel (that acts like a SOCKS proxy). Here is a detailed how to, but in Spanish: http://www.taniquetil.com.ar/plog/post/1/303 If you can access other machine with SSH, feel free to contact me directly if need help to set up something like this. Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/
![](https://secure.gravatar.com/avatar/da3a20a76a69532fa83e790e89cb4c6c.jpg?s=120&d=mm&r=g)
Hey, I had a very similar idea and implementation back in June (that also passed all regression tests): http://mail.python.org/pipermail/python-ideas/2007-June/000902.html When I read Neil's mail I almost thought it was my old mail :-) Unfortunately, when I posted my optimization, it pretty much got ignored. Maybe I have not worded it properly. The main difference between our implementations, if I understand Neil's explanation correctly, is that you use direct ptrs into the dict and notify the ptr holders of relocations. I used a different method, where you call a new PyDict_ExportKey method and it creates a mediating element. The mediating element has a fixed position so it can be dereferenced directly. Direct access to it is just as fast, but it may be slightly affecting dict performance. I think a hybrid approach similar to Neil's, but with a mediating object to represent the access to the dict and do the observing for its user could be nicer (hell, Neil might already be doing this). P.S: I also had a more ambitious plan, after eliminating globals/builtins dict lookups, to use mro caches more aggressively with this optimization and type-specialization on code objects, to also eliminate class-side dict lookups. The user can also eliminate instance-side dict lookupts via __slots__ - effectively allowing the conversion of virtually _all_ namespace dict lookups in pure Python code to be direct memory dereferences, isn't that exciting? :-) Eyal On Nov 22, 2007 7:07 PM, Guido van Rossum <guido@python.org> wrote:
[Quick] The best way to post a patch is to put it in the bug tracker at bugs.python.org, and post a link to the issue here. The best way to create a patch is svn diff, assuming you started out with an anonymous svn checkout (see python.org/dev/) and not just with a distro download.
Looking forward to it!
--Guido
On Nov 22, 2007 8:42 AM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
Guido van Rossum wrote:
Cool! Are you willing to show the code yet (bugs and all)?
Sure! I stayed up all night doing it and today is Thanksgiving, so I'll probably not get to it for a little while. (I know making a patch shouldn't take long, but I've never done it before.) Should I post the patch here or somewhere else?
Some questions:
- what's the space & time impact for a dict with no watchers?
I think it's almost negligible.
Space: There are four bytes extra on every dict for a pointer to the observer list. It may actually be zero or eight or more depending on alignment and malloc block size - I haven't looked.
Time: On dicts with no observers, dealloc, delitem, pop, popitem, clear, and resize pass through an "if (mp->ma_entryobs_list != NULL)". PyDict_New sets mp->ma_entryobs_list to NULL. Nothing else is affected.
- does this do anything for builtins?
It does right now well enough to get them quickly, but setting or deleting them elsewhere won't show up yet in the frame. And it doesn't handle the case where __builtins__ is replaced. That'll take a little doing, but just mentally - it shouldn't affect performance much.
Anyway, that part will work properly when I'm done.
- could this be made to work for instance variables?
If my brain were thinking in straight lines, maybe I'd come up with something. :) I've got this fuzzy idea that it just might work. The hard part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR applied to something else. Hmm...
Something to digest while I'm digesting the Real Other White Meat. :)
- what about exec(src, ns) where ns is a mapping but not a dict?
Good question - I don't know, but I think it should work, at least as well as it did before. If there's no observer attached to a frame, it'll default to its previous behavior. Another hmm...
Thanks for the prompt reply.
Neil
P.S. By the way, I'm very pleased with how clean and workable the codebase is. I actually cheered at the lack of warnings. My wife probably thinks I'm nuts. :D
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________
Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
"Eyal Lotem" <eyal.lotem@gmail.com> wrote in message news:b64f365b0711221407l7564b507p7359c227866fd230@mail.gmail.com... | Hey, I had a very similar idea and implementation back in June (that | also passed all regression tests): | http://mail.python.org/pipermail/python-ideas/2007-June/000902.html | | When I read Neil's mail I almost thought it was my old mail :-) | | Unfortunately, when I posted my optimization, it pretty much got | ignored. Maybe I have not worded it properly. Rereading your post, I think Neil's was a bit clearer, partly because it had more details. In particular, I see no mention of ... | I used a different method, where you call a new PyDict_ExportKey | method and it creates a mediating element. The mediating element has | a fixed position so it can be dereferenced directly. (which I do not quite get, actually, but that is probably just me.) More important, I think, is timing. Last June, the focus was on defining what 3.0 would consist of. Now that that is mostly done, and the result found to be slower than 2.5, I think more attention is available for speed issues. It will be great if the two of you can come up with a clean lookup speedup that avoids any showstopper issues. This issue has been rumbling around 'in the basement' for several years. Terry Jan Reedy
![](https://secure.gravatar.com/avatar/a051fe145baf8b841107a9246efbc2dc.jpg?s=120&d=mm&r=g)
Eyal Lotem wrote:
Hey, I had a very similar idea and implementation back in June (that also passed all regression tests): http://mail.python.org/pipermail/python-ideas/2007-June/000902.html
When I read Neil's mail I almost thought it was my old mail :-)
Unfortunately, when I posted my optimization, it pretty much got ignored. Maybe I have not worded it properly.
The main difference between our implementations, if I understand Neil's explanation correctly, is that you use direct ptrs into the dict and notify the ptr holders of relocations.
I used a different method, where you call a new PyDict_ExportKey method and it creates a mediating element. The mediating element has a fixed position so it can be dereferenced directly. Direct access to it is just as fast, but it may be slightly affecting dict performance.
Nicely done. :)
I think a hybrid approach similar to Neil's, but with a mediating object to represent the access to the dict and do the observing for its user could be nicer (hell, Neil might already be doing this).
I am, actually. I originally had the observer be the function object itself, but that presented problems with generators, which create a frame object from a function and then dump the function. I had assumed that a frame would never outlast the function object it was created from and ended up with dangling pointers. D'oh! Anyway, it's correct now and the details are well-abstracted. The mediating object is called PyFastGlobalsAdapter. ("Adapter" because it allows you to getitem/setitem a dict like you do a list - using an index.) It gets bunted about among functions, frames, and eval code. It basically has the following members: PyObject *globals; /* PyDictObject only */ PyObject *names; /* From func_code->co_names */ PyDictEntry **entries; /* Struct pointers into globals entries */ (I've omitted the details for builtins because I haven't got them totally worked out yet. I'll probably have a PyObject *builtins and a PyDictEntry *builtins_entry pointing at the globals dict so I can detect when __builtins__ is replaced within globals.) On init, it registers itself with the globals dict and starts keeping track of pointers to dict entries in "entries". "entries" is the same length as "names". Getting the value globals[names[i]] is done by just referencing entries[i]->me_value. It's very quick. :) There's a PyFastGlobals_GetItem(PyObject *fg, int index) that does it for you and also does the necessary bookkeeping to update the PyDictEntry pointers when there's a miss. (entries[i] == NULL; happens when a key is anticipated but not in the dict at first.) I agree that the dict observer interface + an adapter is a good way to go. The dict part should be flexible, lean and fast (like dicts themselves), and the simple observer interface does just that. The adapter keeps it all correct and refcount-y, and provides a convenient way to get values by index. Would it be worth it to expose dict adapters as a Python object? Then Python code could do this kind of crazy stuff: d = {<some dict>} a = dictadapter(d, ('keys', 'i', 'want', 'fast', 'access', 'to')) a[0] == d['keys'] and a[1] == d['i'] #..., etc. => True That could make it a lot easier to experiment with fast cacheless dict lookups in other contexts. The problem is, I have no idea what those contexts might be, at least within Python code. :)
P.S: I also had a more ambitious plan, after eliminating globals/builtins dict lookups, to use mro caches more aggressively with this optimization and type-specialization on code objects, to also eliminate class-side dict lookups. The user can also eliminate instance-side dict lookupts via __slots__ - effectively allowing the conversion of virtually _all_ namespace dict lookups in pure Python code to be direct memory dereferences, isn't that exciting? :-)
Extremely! There's no reason it couldn't be done. But I'm not exactly sure what you mean (seeing the idea only from the context of my own hacks), so could you elaborate a bit? :D When you said "mro" I immediately thought of this adapter layout: PyList of MRO class dicts PyTuple of common attribute names (or list of tuples) PyDictEntry * array per MRO class But where would the name index come from? The co_names tuples are different for each method. Neil
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Neil Toronto wrote:
The hard part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR applied to something else.
Why would you *want* to distinguish that? A decent attribute lookup acceleration mechanism should work for attributes of any object, not just self. Think method calls, which are probably even more common than accesses to globals. -- Greg
![](https://secure.gravatar.com/avatar/a051fe145baf8b841107a9246efbc2dc.jpg?s=120&d=mm&r=g)
Greg Ewing wrote:
Neil Toronto wrote:
The hard part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR applied to something else.
Why would you *want* to distinguish that? A decent attribute lookup acceleration mechanism should work for attributes of any object, not just self. Think method calls, which are probably even more common than accesses to globals.
Now that's a durned good point. My cute little hack can be used anywhere you have a mostly-static dict (or at least one that grows infrequently) and a tuple of keys for which you want to repeatedly get or set values. As long as lookups start as tuple indexes (like indexes into co_names and such), things go fast. I'm still a bit fuzzy about how it would be used with LOAD_ATTR. Let's restrict it to just accelerating self.<attr> lookups for now. The oparg to LOAD_ATTR and STORE_ATTR is the co_names index, so co_names is again the tuple of keys. But it seems like you'd need an adapter (see previous reply to Eyal for terminology) for each pair of (self, method). Is there a better way? Neil
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Neil Toronto wrote:
But it seems like you'd need an adapter (see previous reply to Eyal for terminology) for each pair of (self, method). Is there a better way?
I started writing down some ideas for this, but then I realised that it doesn't really extend to attribute lookup in general. The reason is that only some kinds of attribute have their values stored in dict entries -- mainly just instance variables of user-defined class instances. Bound methods, attributes of built-in objects, etc., would be left out. I think the way to approach this is to have a global cache which is essentially a dictionary mapping (obj, name) pairs to some object that knows how to set or get the attribute value as directly as possible. While this wouldn't eliminate dict lookups entirely, in the case of a cache hit it would just be a single lookup instead of potentially many. Some of the ideas behind your adapter might be carried over, such as the idea of callbacks triggered by changes to the underlying objects to help keep the cache up to date. But there would probably have to be a variety of such callback mechanisms for use by different kinds of objects. -- Greg
![](https://secure.gravatar.com/avatar/d91ce240d2445584e295b5406d12df70.jpg?s=120&d=mm&r=g)
On 11/22/07, Neil Toronto <ntoronto@cs.byu.edu> wrote:
... What if a frame could maintain an array of pointers right into a dictionary's entry table?
The dict notifies its observers on delitem, pop, popitem, resize and clear. Nothing else is necessary - nothing else will change the address of or invalidate an entry.
I think this isn't quite true, because of DUMMY entries. Insert key1. Insert key2 that wants the same slot. Register an observer that cares about key2 but not key1. Delete key1. The key1 entry is replaced with DUMMY, but the entry for key2 is not affected. Look up key2 (by some other code which hasn't already taken this shortcut) and the lookdict function (as a side effect) moves key2 to the better location that key1 no longer occupies. As described, I think this breaks your cache. Of course you can get around this by just not moving things without a resize, but that is likely to be horrible for the (non-namespace?) dictionaries that do see frequent deletions. Another way around it is to also notify the observers whenever lookdict moves an entry; I'm not sure how that would affect normal lookup performance. A more radical change is to stop exposing the internal structure at all. For example, a typical namespace might instead be represented as an array of values, plus a dict mapping names to indices. The cost would be an extra pointer for each key ever in the dictionary (since you wouldn't reuse positional slots), and the savings would be that most lookups could just grab namespace[i] without having to even check that they got the right key, let alone following a trail of collision resolutions.
To speed up globals access, an auxiliary object to functions and frames registers itself as an observer to func_globals and __builtins__.
Note that func_globals probably *will* be updated again in the future, if only to register this very function with its module. You could wait to "seal" a namespace until you think all its names are known, or you could adapt the timestamp solution suggested in http://bugs.python.org/issue1616125 -jJ
![](https://secure.gravatar.com/avatar/a051fe145baf8b841107a9246efbc2dc.jpg?s=120&d=mm&r=g)
Jim Jewett wrote:
On 11/22/07, Neil Toronto <ntoronto@cs.byu.edu> wrote:
... What if a frame could maintain an array of pointers right into a dictionary's entry table?
The dict notifies its observers on delitem, pop, popitem, resize and clear. Nothing else is necessary - nothing else will change the address of or invalidate an entry.
I think this isn't quite true, because of DUMMY entries.
Insert key1. Insert key2 that wants the same slot. Register an observer that cares about key2 but not key1.
Delete key1. The key1 entry is replaced with DUMMY, but the entry for key2 is not affected.
Look up key2 (by some other code which hasn't already taken this shortcut) and the lookdict function (as a side effect) moves key2 to the better location that key1 no longer occupies. As described, I think this breaks your cache.
Good grief old chap, you freaked me out. Turns out it all still works. Whether the lookdict functions used to move entries around I don't know, but now it doesn't. It's probably because deletions are so rare compared to other operations that it's not worth the extra logic in those tight little loops. Mind if I keep rambling, just to make sure I've got it right? :) It's the dummy entries that make lookup work at all. The lookdict functions use them as flags so that it knows to keep skipping around the table looking for an open entry or an entry with the right key. It's basically: "If ep->me_key != key or ep->me_key == dummy, I need to keep trying different ep's. If I reach an empty ep, return the first dummy I found or that ep if I didn't find one. If I reach an ep with the right key, return that." I wasn't completely satisfied by static analysis, so I traced the case you brought up through both lookdict and lookdict_string. Here it is: Assume hash(key1) == hash(key2). Assume (without loss of generality) that for this hash, entries are traversed in order 0, 1, 2... Insert key1: 0: key1 1: NULL 2: ... Insert key2: 0: key1 1: key2 2: ... Delete key1: 0: dummy 1: key2 2: ... Look up key2 (trace): ("freeslot" keeps track of the first dummy found on the traversal; is NULL if none found) start: ep = 0 freeslot = 0 [ep->me_key == dummy] loop: ep = 1 return ep [ep->me_key == key2] In that last bit would have been the part that goes something like this: if (freeslot != NULL) { /* refcount-neutral */ *freeslot = *ep; ep->me_key = dummy; ep->me_value = NULL; return freeslot; } else return ep; It might be a speed improvement if you assume that the key is very likely to be looked up again. But it's extra complexity in a speed-critical code path and you never know whether you lengthened the traversal for other lookups. As long as it's a wash in the end, it might as well be left alone, at least for the fast globals. :D Neil
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Nov 24, 2007 6:18 PM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
Jim Jewett wrote:
I think this isn't quite true, because of DUMMY entries.
Insert key1. Insert key2 that wants the same slot. Register an observer that cares about key2 but not key1.
Delete key1. The key1 entry is replaced with DUMMY, but the entry for key2 is not affected.
Look up key2 (by some other code which hasn't already taken this shortcut) and the lookdict function (as a side effect) moves key2 to the better location that key1 no longer occupies. As described, I think this breaks your cache.
Good grief old chap, you freaked me out.
Turns out it all still works. Whether the lookdict functions used to move entries around I don't know, but now it doesn't. It's probably because deletions are so rare compared to other operations that it's not worth the extra logic in those tight little loops.
I don't know where Jim gets his information, but I don't recall that just looking up a key has ever moved entries around. You'd have to delete and re-add it to get it moved. (Or you'd have to hit the "rehash everything to a larger hash table" of course.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (7)
-
Eyal Lotem
-
Facundo Batista
-
Greg Ewing
-
Guido van Rossum
-
Jim Jewett
-
Neil Toronto
-
Terry Reedy