[Python-Dev] Accessing globals without dict lookup
Fri, 08 Feb 2002 18:04:49 -0500
I'm not looking for point-by-point answers here, I'm just pointing out
things that were hard to follow so that they may get addressed in a
> Inspired by talks by Jeremy and Skip on DevDay, here's a different
> idea for speeding up access to globals. It retain semantics but (like
> Jeremy's proposal) changes the type of a module's __dict__.
> - Let a cell be a really simple PyObject, containing a PyObject
> pointer and a cell pointer.
Meaning a pointer to a cell, I bet. Note that in the pseduo-code at the
end, the cellptr member of cell objects is never referenced, so it's hard to
> Both pointers may be NULL. (It may have to be called PyGlobalCell
> since I believe there's already a PyCell object.)
There is a PyCellObject already.
> (Maybe it doesn't even have to be an object -- it could just be a tiny
Would probably make it much harder to use the existing dict code (which maps
PyObjects to PyObjects).
> - Let a celldict be a mapping that is implemented using a dict of
Presumably this is a mapping *to* cells, and from ...? String objects?
> When you use its getitem method, the PyObject * in the cell is
> dereferenced, and if a NULL is found, getitem raises KeyError
> even if the cell exists.
Had a hard time with this:
1. Letting p be "the PyObject* in the cell", are you saying p==NULL or
*p==NULL is the KeyError trigger? "dereference" suggests the latter,
but the former seems to make more sense.
2. Presumably the first "the cell" in this sentence refers to a
different cell than the second "the cell" intends.
> Using setitem to add a new value creates a new cell and stores the
> value there;
Presumably in the PyObject* member of the new cell. To what is the cellptr
member of the new cell set? I think NULL.
> using setitem to change the value for an existing key stores the
> value in the existing cell for that key. There's a separate API to
> access the cells.
delitem is missing, but presumably straightforward.
> - We change the module implementation to use a celldict for its
> __dict__. The module's getattr and setattr operations now map to
> getitem and setitem on the celldict. I think the type of
> <module>.__dict__ and globals() is the only backwards
> - When a module is initialized, it gets its __builtins__ from the
> __builtin__ module, which is itself a celldict.
Surely the __builtin__ module isn't a celldict, but rather has a __dict__
that is a celldict.
> For each cell in __builtins__, the new module's __dict__ adds a cell
> with a NULL PyObject pointer, whose cell pointer points to the
> corresponding cell of __builtins__.
> - The compiler generates LOAD_GLOBAL_CELL <i> (and STORE_GLOBAL_CELL
> <i> etc.) opcodes for references to globals, where <i> is a small
> index with meaning only within one code object like the const index
> in LOAD_CONST. The code object has a new tuple, co_globals, giving
> the names of the globals referenced by the code indexed by <i>. I
> think no new analysis is required to be able to do this.
> - When a function object is created from a code object and a celldict,
> the function object creates an array of cell pointers by asking the
> celldict for cells corresponding to the names in the code object's
> co_globals. If the celldict doesn't already have a cell for a
> particular name, it creates and an empty one. This array of cell
> pointers is stored on the function object as func_cells.
I expect that the more we use these guys (cells), the more valuable to make
them PyObjects in their own right (for uniformity, ease of introspection,
> When a function object is created from a regular dict instead of a
> celldict, func_cells is a NULL pointer.
This part is regrettable, since it's Yet Another NULL check at the *top* of
code using this stuff (meaning it slows the normal case, assuming that it's
unusual not to get a celldict). I'm not clear on how code ends up getting
created from a regular dict instead of a celldict -- is this because of
stuff like "exec whatever in mydict"?
> - When the VM executes a LOAD_GLOBAL_CELL <i> instruction, it gets
> cell number <i> from func_cells. It then looks in the cell's
> PyObject pointer, and if not NULL, that's the global value. If it
> is NULL, it follows the cell's cell pointer to the next cell, if it
> is not NULL, and looks in the PyObject pointer in that cell. If
> that's also NULL, or if there is no second cell, NameError is
> raised. (It could follow the chain of cell pointers until a NULL
> cell pointer is found; but I have no use for this.) Similar for
> STORE_GLOBAL_CELL <i>, except it doesn't follow the cell pointer
> chain -- it always stores in the first cell.
If I'm reading this right, then in the normal case of resolving "len" in
1. We test func_cells for NULL and find out it isn't.
2. A pointer to a cell object is read out of func_cells at a fixed (wrt
this function) offset. This points to len's cell object in the
3. The cell object's PyObject* pointer is tested and found to be NULL.
4. The cell object's cellptr pointer is tested and found not to be NULL.
This points to len's cell object in __builtin__'s celldict.
5. The cell object's cellptr's PyObject* is tested and found not to be
6. The cell object's cellptr's PyObject* is returned.
> - There are fallbacks in the VM for the case where the function's
> globals aren't a celldict, and hence func_cells is NULL. In that
> case, the code object's co_globals is indexed with <i> to find the
> name of the corresponding global and this name is used to index the
> function's globals dict.
Which may not succeed, so we also need another level to back off to the
builtins. I'd like to pursue getting rid of the func_cells==NULL special
case, even if it means constructing a celldict out of a regular dict for the
duration, and feeding mutations back in to the regular dict afterwards.
> I believe that's it. I think this faithfully implements the current
> semantics (where a global can shadow a builtin), but without the need
> for any dict lookups when accessing globals, except in cases where an
> explicit dict is passed to exec or eval().
I think <wink> I agree.
Note that a chain of 4 test+branches against NULL in "the usual case" for
builtins may not be faster on average than inlining the first few useful
lines of lookdict_string twice (the expected path in this routine became
fat-free for 2.2):
i = hash;
ep = &ep0[i];
if (ep->me_key == NULL || ep->me_key == key)
Win or lose, that's usually the end of a dict lookup. That is, I'm certain
we're paying significantly more for layers of C-level function call overhead
today than for what the dict implementation actually does now (in the usual
> Compare this to Jeremy's scheme using dlicts:
> - My approach doesn't require global agreement on the numbering of the
> globals; each code object has its own numbering. This avoids the
> need for more global analysis,
Don't really care about that.
> and allows adding code to a module using exec that introduces new
> globals without having to fall back on a less efficient scheme.
That is indeed lovely.
> - Jeremy's approach might be a teensy bit faster because it may have
> to do less work in the LOAD_GLOBAL; but I'm not convinced.
LOAD_GLOBAL is executed much more often than STORE_GLOBAL, so whichever
scheme wins for LOAD_GLOBAL will enjoy a multiplier effect when measuring
[and skipping the code]