[Python-ideas] Fast global cacheless lookup

Fri Nov 23 09:26:19 CET 2007

Eyal Lotem wrote:
> Hey, I had a very similar idea and implementation back in June (that
> also passed all regression tests):
> http://mail.python.org/pipermail/python-ideas/2007-June/000902.html
> 
> When I read Neil's mail I almost thought it was my old mail :-)
> 
> Unfortunately, when I posted my optimization, it pretty much got
> ignored. Maybe I have not worded it properly.
> 
> The main difference between our implementations, if I understand
> Neil's explanation correctly, is that you use direct ptrs into the
> dict and notify the ptr holders of relocations.
> 
> I used a different method, where you call a new PyDict_ExportKey
> method and it creates a mediating element.  The mediating element has
> a fixed position so it can be dereferenced directly.  Direct access to
> it is just as fast, but it may be slightly affecting dict performance.

Nicely done. :)

> I think a hybrid approach similar to Neil's, but with a mediating
> object to represent the access to the dict and do the observing for
> its user could be nicer (hell, Neil might already be doing this).

I am, actually. I originally had the observer be the function object 
itself, but that presented problems with generators, which create a 
frame object from a function and then dump the function. I had assumed 
that a frame would never outlast the function object it was created from 
and ended up with dangling pointers. D'oh!

Anyway, it's correct now and the details are well-abstracted. The 
mediating object is called PyFastGlobalsAdapter. ("Adapter" because it 
allows you to getitem/setitem a dict like you do a list - using an 
index.) It gets bunted about among functions, frames, and eval code. It 
basically has the following members:

     PyObject *globals;      /* PyDictObject only */
     PyObject *names;        /* From func_code->co_names */
     PyDictEntry **entries;  /* Struct pointers into globals entries */

(I've omitted the details for builtins because I haven't got them 
totally worked out yet. I'll probably have a PyObject *builtins and a 
PyDictEntry *builtins_entry pointing at the globals dict so I can detect 
when __builtins__ is replaced within globals.)

On init, it registers itself with the globals dict and starts keeping 
track of pointers to dict entries in "entries". "entries" is the same 
length as "names". Getting the value globals[names[i]] is done by just 
referencing entries[i]->me_value. It's very quick. :)

There's a PyFastGlobals_GetItem(PyObject *fg, int index) that does it 
for you and also does the necessary bookkeeping to update the 
PyDictEntry pointers when there's a miss. (entries[i] == NULL; happens 
when a key is anticipated but not in the dict at first.)

I agree that the dict observer interface + an adapter is a good way to 
go. The dict part should be flexible, lean and fast (like dicts 
themselves), and the simple observer interface does just that. The 
adapter keeps it all correct and refcount-y, and provides a convenient 
way to get values by index.

Would it be worth it to expose dict adapters as a Python object? Then 
Python code could do this kind of crazy stuff:

     d = {<some dict>}
     a = dictadapter(d, ('keys', 'i', 'want', 'fast', 'access', 'to'))
     a[0] == d['keys'] and a[1] == d['i']  #..., etc. => True

That could make it a lot easier to experiment with fast cacheless dict 
lookups in other contexts. The problem is, I have no idea what those 
contexts might be, at least within Python code. :)

> P.S: I also had a more ambitious plan, after eliminating
> globals/builtins dict lookups, to use mro caches more aggressively
> with this optimization and type-specialization on code objects, to
> also eliminate class-side dict lookups. The user can also eliminate
> instance-side dict lookupts via __slots__ - effectively allowing the
> conversion of virtually _all_ namespace dict lookups in pure Python
> code to be direct memory dereferences, isn't that exciting? :-)

Extremely! There's no reason it couldn't be done. But I'm not exactly 
sure what you mean (seeing the idea only from the context of my own 
hacks), so could you elaborate a bit? :D  When you said "mro" I 
immediately thought of this adapter layout:

     PyList of MRO class dicts
     PyTuple of common attribute names (or list of tuples)
     PyDictEntry * array per MRO class

But where would the name index come from? The co_names tuples are 
different for each method.

Neil