[Python-Dev] Accessing globals without dict lookup

Tim Peters tim.one@comcast.net
Fri, 08 Feb 2002 18:04:49 -0500


I'm not looking for point-by-point answers here, I'm just pointing out
things that were hard to follow so that they may get addressed in a
revision.

[Guido]
> Inspired by talks by Jeremy and Skip on DevDay, here's a different
> idea for speeding up access to globals.  It retain semantics but (like
> Jeremy's proposal) changes the type of a module's __dict__.
>
> - Let a cell be a really simple PyObject, containing a PyObject
>   pointer and a cell pointer.

Meaning a pointer to a cell, I bet.  Note that in the pseduo-code at the
end, the cellptr member of cell objects is never referenced, so it's hard to
be sure.

>   Both pointers may be NULL.  (It may have to be called PyGlobalCell
>   since I believe there's already a PyCell object.)

There is a PyCellObject already.

>   (Maybe it doesn't even have to be an object -- it could just be a tiny
>    struct.)

Would probably make it much harder to use the existing dict code (which maps
PyObjects to PyObjects).

> - Let a celldict be a mapping that is implemented using a dict of
>   cells.

Presumably this is a mapping *to* cells, and from ...?  String objects?

>   When you use its getitem method, the PyObject * in the cell is
>   dereferenced, and if a NULL is found, getitem raises KeyError
>   even if the cell exists.

Had a hard time with this:

1. Letting p be "the PyObject* in the cell", are you saying p==NULL or
   *p==NULL is the KeyError trigger?  "dereference" suggests the latter,
   but the former seems to make more sense.

2. Presumably the first "the cell" in this sentence refers to a
   different cell than the second "the cell" intends.

>   Using setitem to add a new value creates a new cell and stores the
>   value there;

Presumably in the PyObject* member of the new cell.  To what is the cellptr
member of the new cell set?  I think NULL.

>   using setitem to change the value for an existing key stores the
>   value in the existing cell for that key.  There's a separate API to
>   access the cells.

delitem is missing, but presumably straightforward.

> - We change the module implementation to use a celldict for its
>   __dict__.  The module's getattr and setattr operations now map to
>   getitem and setitem on the celldict.  I think the type of
>   <module>.__dict__ and globals() is the only backwards
>   incompatibility.
>
> - When a module is initialized, it gets its __builtins__ from the
>   __builtin__ module, which is itself a celldict.

Surely the __builtin__ module isn't a celldict, but rather has a __dict__
that is a celldict.

>   For each cell in __builtins__, the new module's __dict__ adds a cell
>   with a NULL  PyObject pointer, whose cell pointer points to the
>   corresponding cell of __builtins__.
>
> - The compiler generates LOAD_GLOBAL_CELL <i> (and STORE_GLOBAL_CELL
>   <i> etc.) opcodes for references to globals, where <i> is a small
>   index with meaning only within one code object like the const index
>   in LOAD_CONST.  The code object has a new tuple, co_globals, giving
>   the names of the globals referenced by the code indexed by <i>.  I
>   think no new analysis is required to be able to do this.

Me too.

> - When a function object is created from a code object and a celldict,
>   the function object creates an array of cell pointers by asking the
>   celldict for cells corresponding to the names in the code object's
>   co_globals.  If the celldict doesn't already have a cell for a
>   particular name, it creates and an empty one.  This array of cell
>   pointers is stored on the function object as func_cells.

I expect that the more we use these guys (cells), the more valuable to make
them PyObjects in their own right (for uniformity, ease of introspection,
etc).

>   When a function object is created from a regular dict instead of a
>   celldict, func_cells is a NULL pointer.

This part is regrettable, since it's Yet Another NULL check at the *top* of
code using this stuff (meaning it slows the normal case, assuming that it's
unusual not to get a celldict).  I'm not clear on how code ends up getting
created from a regular dict instead of a celldict -- is this because of
stuff like "exec whatever in mydict"?

> - When the VM executes a LOAD_GLOBAL_CELL <i> instruction, it gets
>   cell number <i> from func_cells.  It then looks in the cell's
>   PyObject pointer, and if not NULL, that's the global value.  If it
>   is NULL, it follows the cell's cell pointer to the next cell, if it
>   is not NULL, and looks in the PyObject pointer in that cell.  If
>   that's also NULL, or if there is no second cell, NameError is
>   raised.  (It could follow the chain of cell pointers until a NULL
>   cell pointer is found; but I have no use for this.)  Similar for
>   STORE_GLOBAL_CELL <i>, except it doesn't follow the cell pointer
>   chain -- it always stores in the first cell.

If I'm reading this right, then in the normal case of resolving "len" in

def mylen(s):
    return len(s)

1. We test func_cells for NULL and find out it isn't.
2. A pointer to a cell object is read out of func_cells at a fixed (wrt
   this function) offset.  This points to len's cell object in the
   module's celldict.
3. The cell object's PyObject* pointer is tested and found to be NULL.
4. The cell object's cellptr pointer is tested and found not to be NULL.
   This points to len's cell object in __builtin__'s celldict.
5. The cell object's cellptr's PyObject* is tested and found not to be
   NULL.
6. The cell object's cellptr's PyObject* is returned.

> - There are fallbacks in the VM for the case where the function's
>   globals aren't a celldict, and hence func_cells is NULL.  In that
>   case, the code object's co_globals is indexed with <i> to find the
>   name of the corresponding global and this name is used to index the
>   function's globals dict.

Which may not succeed, so we also need another level to back off to the
builtins.  I'd like to pursue getting rid of the func_cells==NULL special
case, even if it means constructing a celldict out of a regular dict for the
duration, and feeding mutations back in to the regular dict afterwards.

> I believe that's it.  I think this faithfully implements the current
> semantics (where a global can shadow a builtin), but without the need
> for any dict lookups when accessing globals, except in cases where an
> explicit dict is passed to exec or eval().

I think <wink> I agree.

Note that a chain of 4 test+branches against NULL in "the usual case" for
builtins may not be faster on average than inlining the first few useful
lines of lookdict_string twice (the expected path in this routine became
fat-free for 2.2):

	i = hash;
	ep = &ep0[i];
	if (ep->me_key == NULL || ep->me_key == key)
		return ep;

Win or lose, that's usually the end of a dict lookup.  That is, I'm certain
we're paying significantly more for layers of C-level function call overhead
today than for what the dict implementation actually does now (in the usual
cases).

> Compare this to Jeremy's scheme using dlicts:
>
> http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobals
>
> - My approach doesn't require global agreement on the numbering of the
>   globals; each code object has its own numbering.  This avoids the
>   need for more global analysis,

Don't really care about that.

>   and allows adding code to a module using exec that introduces new
>   globals without having to fall back on a less efficient scheme.

That is indeed lovely.

> - Jeremy's approach might be a teensy bit faster because it may have
>   to do less work in the LOAD_GLOBAL; but I'm not convinced.

LOAD_GLOBAL is executed much more often than STORE_GLOBAL, so whichever
scheme wins for LOAD_GLOBAL will enjoy a multiplier effect when measuring
overall performance.

[and skipping the code]