Accessing globals without dict lookup
Inspired by talks by Jeremy and Skip on DevDay, here's a different idea for speeding up access to globals. It retain semantics but (like Jeremy's proposal) changes the type of a module's __dict__. - Let a cell be a really simple PyObject, containing a PyObject pointer and a cell pointer. Both pointers may be NULL. (It may have to be called PyGlobalCell since I believe there's already a PyCell object.) (Maybe it doesn't even have to be an object -- it could just be a tiny struct.) - Let a celldict be a mapping that is implemented using a dict of cells. When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists. Using setitem to add a new value creates a new cell and stores the value there; using setitem to change the value for an existing key stores the value in the existing cell for that key. There's a separate API to access the cells. - We change the module implementation to use a celldict for its __dict__. The module's getattr and setattr operations now map to getitem and setitem on the celldict. I think the type of <module>.__dict__ and globals() is the only backwards incompatibility. - When a module is initialized, it gets its __builtins__ from the __builtin__ module, which is itself a celldict. For each cell in __builtins__, the new module's __dict__ adds a cell with a NULL PyObject pointer, whose cell pointer points to the corresponding cell of __builtins__. - The compiler generates LOAD_GLOBAL_CELL <i> (and STORE_GLOBAL_CELL <i> etc.) opcodes for references to globals, where <i> is a small index with meaning only within one code object like the const index in LOAD_CONST. The code object has a new tuple, co_globals, giving the names of the globals referenced by the code indexed by <i>. I think no new analysis is required to be able to do this. - When a function object is created from a code object and a celldict, the function object creates an array of cell pointers by asking the celldict for cells corresponding to the names in the code object's co_globals. If the celldict doesn't already have a cell for a particular name, it creates and an empty one. This array of cell pointers is stored on the function object as func_cells. When a function object is created from a regular dict instead of a celldict, func_cells is a NULL pointer. - When the VM executes a LOAD_GLOBAL_CELL <i> instruction, it gets cell number <i> from func_cells. It then looks in the cell's PyObject pointer, and if not NULL, that's the global value. If it is NULL, it follows the cell's cell pointer to the next cell, if it is not NULL, and looks in the PyObject pointer in that cell. If that's also NULL, or if there is no second cell, NameError is raised. (It could follow the chain of cell pointers until a NULL cell pointer is found; but I have no use for this.) Similar for STORE_GLOBAL_CELL <i>, except it doesn't follow the cell pointer chain -- it always stores in the first cell. - There are fallbacks in the VM for the case where the function's globals aren't a celldict, and hence func_cells is NULL. In that case, the code object's co_globals is indexed with <i> to find the name of the corresponding global and this name is used to index the function's globals dict. I believe that's it. I think this faithfully implements the current semantics (where a global can shadow a builtin), but without the need for any dict lookups when accessing globals, except in cases where an explicit dict is passed to exec or eval(). Compare this to Jeremy's scheme using dlicts: http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobals - My approach doesn't require global agreement on the numbering of the globals; each code object has its own numbering. This avoids the need for more global analysis, and allows adding code to a module using exec that introduces new globals without having to fall back on a less efficient scheme. - Jeremy's approach might be a teensy bit faster because it may have to do less work in the LOAD_GLOBAL; but I'm not convinced. Here's a implementation sketch for cell and celldict. Note in particular that keys() only returns the keys for which the cell's objptr is not NULL. NULL = object() # used as a token class cell(object): def __init__(self): self.objptr = NULL self.cellptr = NULL class celldict(object): def __init__(self): self.__dict = {} # dict of cells def getcell(self, key): c = self.__dict.get(key) if c is None: c = cell() self.__dict[key] = c return c def __getitem__(self, key): c = self.__dict.get(key) if c is None: raise KeyError, key value = c.objptr if value is NULL: raise KeyError, key else: return value def __setitem__(self, key, value): c = self.__dict.get(key) if c is None: c = cell() self.__dict[key] = c c.objptr = value def __delitem__(self, key): c = self.__dict.get(key) if c is None or c.objptr is NULL: raise KeyError, key c.objptr = NULL def keys(self): return [c.objptr for c in self.__dict.keys() if c.objptr is not NULL] def clear(self): for c in self.__dict.values(): c.objptr = NULL # Etc. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum writes:
def keys(self): return [c.objptr for c in self.__dict.keys() if c.objptr is not NULL]
I presume you meant values() here rather than keys()? The keys() method could simply delegate to self.__dict. I imagine most of us can fill in any additional dictionary methods, though. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
Guido van Rossum writes:
def keys(self): return [c.objptr for c in self.__dict.keys() if c.objptr is not NULL]
I presume you meant values() here rather than keys()? The keys() method could simply delegate to self.__dict. I imagine most of us can fill in any additional dictionary methods, though.
Oops, I was indeed confused. I think I meant this: def keys(self): return [k for k, c in self.__dict.iteritems() if c.objptr is not NULL] And indeed I expected that you could extrapolate to the other methods. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum writes:
Oops, I was indeed confused. I think I meant this:
def keys(self): return [k for k, c in self.__dict.iteritems() if c.objptr is not NULL]
Was I not clear, or am I missing something entirely? keys() needs *no* special treatment, but items() and values() do: class celldict(object): ... def keys(self): return self.__dict.keys() def items(self): return [k, c.objptr for k, c in self.__dict.iteritems() if c.objptr is not NULL] def values(self): return [c.objptr for c in self.__dict.itervalues() if c.objptr is not NULL] -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
Guido van Rossum writes:
Oops, I was indeed confused. I think I meant this:
def keys(self): return [k for k, c in self.__dict.iteritems() if c.objptr is not NULL]
Was I not clear, or am I missing something entirely?
I'm guessing both. ;-)
keys() needs *no* special treatment, but items() and values() do:
class celldict(object): ...
def keys(self): return self.__dict.keys()
Wrong. keys() *does* need special treatment. If c.objptr is NULL, the cell exists, but keys() should not return the corresponding key. This is so that len(x.keys()) == len(x.values()), amongst other reasons!
def items(self): return [k, c.objptr for k, c in self.__dict.iteritems() if c.objptr is not NULL]
def values(self): return [c.objptr for c in self.__dict.itervalues() if c.objptr is not NULL]
Yes, these are correct. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum writes:
I'm guessing both. ;-) ... Wrong. keys() *does* need special treatment. If c.objptr is NULL, the cell exists, but keys() should not return the corresponding key. This is so that len(x.keys()) == len(x.values()), amongst other reasons!
Ow! Bad Fred! I should know better than to speak up on a Friday! -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
I'm not looking for point-by-point answers here, I'm just pointing out things that were hard to follow so that they may get addressed in a revision. [Guido]
Inspired by talks by Jeremy and Skip on DevDay, here's a different idea for speeding up access to globals. It retain semantics but (like Jeremy's proposal) changes the type of a module's __dict__.
- Let a cell be a really simple PyObject, containing a PyObject pointer and a cell pointer.
Meaning a pointer to a cell, I bet. Note that in the pseduo-code at the end, the cellptr member of cell objects is never referenced, so it's hard to be sure.
Both pointers may be NULL. (It may have to be called PyGlobalCell since I believe there's already a PyCell object.)
There is a PyCellObject already.
(Maybe it doesn't even have to be an object -- it could just be a tiny struct.)
Would probably make it much harder to use the existing dict code (which maps PyObjects to PyObjects).
- Let a celldict be a mapping that is implemented using a dict of cells.
Presumably this is a mapping *to* cells, and from ...? String objects?
When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists.
Had a hard time with this: 1. Letting p be "the PyObject* in the cell", are you saying p==NULL or *p==NULL is the KeyError trigger? "dereference" suggests the latter, but the former seems to make more sense. 2. Presumably the first "the cell" in this sentence refers to a different cell than the second "the cell" intends.
Using setitem to add a new value creates a new cell and stores the value there;
Presumably in the PyObject* member of the new cell. To what is the cellptr member of the new cell set? I think NULL.
using setitem to change the value for an existing key stores the value in the existing cell for that key. There's a separate API to access the cells.
delitem is missing, but presumably straightforward.
- We change the module implementation to use a celldict for its __dict__. The module's getattr and setattr operations now map to getitem and setitem on the celldict. I think the type of <module>.__dict__ and globals() is the only backwards incompatibility.
- When a module is initialized, it gets its __builtins__ from the __builtin__ module, which is itself a celldict.
Surely the __builtin__ module isn't a celldict, but rather has a __dict__ that is a celldict.
For each cell in __builtins__, the new module's __dict__ adds a cell with a NULL PyObject pointer, whose cell pointer points to the corresponding cell of __builtins__.
- The compiler generates LOAD_GLOBAL_CELL <i> (and STORE_GLOBAL_CELL <i> etc.) opcodes for references to globals, where <i> is a small index with meaning only within one code object like the const index in LOAD_CONST. The code object has a new tuple, co_globals, giving the names of the globals referenced by the code indexed by <i>. I think no new analysis is required to be able to do this.
Me too.
- When a function object is created from a code object and a celldict, the function object creates an array of cell pointers by asking the celldict for cells corresponding to the names in the code object's co_globals. If the celldict doesn't already have a cell for a particular name, it creates and an empty one. This array of cell pointers is stored on the function object as func_cells.
I expect that the more we use these guys (cells), the more valuable to make them PyObjects in their own right (for uniformity, ease of introspection, etc).
When a function object is created from a regular dict instead of a celldict, func_cells is a NULL pointer.
This part is regrettable, since it's Yet Another NULL check at the *top* of code using this stuff (meaning it slows the normal case, assuming that it's unusual not to get a celldict). I'm not clear on how code ends up getting created from a regular dict instead of a celldict -- is this because of stuff like "exec whatever in mydict"?
- When the VM executes a LOAD_GLOBAL_CELL <i> instruction, it gets cell number <i> from func_cells. It then looks in the cell's PyObject pointer, and if not NULL, that's the global value. If it is NULL, it follows the cell's cell pointer to the next cell, if it is not NULL, and looks in the PyObject pointer in that cell. If that's also NULL, or if there is no second cell, NameError is raised. (It could follow the chain of cell pointers until a NULL cell pointer is found; but I have no use for this.) Similar for STORE_GLOBAL_CELL <i>, except it doesn't follow the cell pointer chain -- it always stores in the first cell.
If I'm reading this right, then in the normal case of resolving "len" in def mylen(s): return len(s) 1. We test func_cells for NULL and find out it isn't. 2. A pointer to a cell object is read out of func_cells at a fixed (wrt this function) offset. This points to len's cell object in the module's celldict. 3. The cell object's PyObject* pointer is tested and found to be NULL. 4. The cell object's cellptr pointer is tested and found not to be NULL. This points to len's cell object in __builtin__'s celldict. 5. The cell object's cellptr's PyObject* is tested and found not to be NULL. 6. The cell object's cellptr's PyObject* is returned.
- There are fallbacks in the VM for the case where the function's globals aren't a celldict, and hence func_cells is NULL. In that case, the code object's co_globals is indexed with <i> to find the name of the corresponding global and this name is used to index the function's globals dict.
Which may not succeed, so we also need another level to back off to the builtins. I'd like to pursue getting rid of the func_cells==NULL special case, even if it means constructing a celldict out of a regular dict for the duration, and feeding mutations back in to the regular dict afterwards.
I believe that's it. I think this faithfully implements the current semantics (where a global can shadow a builtin), but without the need for any dict lookups when accessing globals, except in cases where an explicit dict is passed to exec or eval().
I think <wink> I agree. Note that a chain of 4 test+branches against NULL in "the usual case" for builtins may not be faster on average than inlining the first few useful lines of lookdict_string twice (the expected path in this routine became fat-free for 2.2): i = hash; ep = &ep0[i]; if (ep->me_key == NULL || ep->me_key == key) return ep; Win or lose, that's usually the end of a dict lookup. That is, I'm certain we're paying significantly more for layers of C-level function call overhead today than for what the dict implementation actually does now (in the usual cases).
Compare this to Jeremy's scheme using dlicts:
http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobals
- My approach doesn't require global agreement on the numbering of the globals; each code object has its own numbering. This avoids the need for more global analysis,
Don't really care about that.
and allows adding code to a module using exec that introduces new globals without having to fall back on a less efficient scheme.
That is indeed lovely.
- Jeremy's approach might be a teensy bit faster because it may have to do less work in the LOAD_GLOBAL; but I'm not convinced.
LOAD_GLOBAL is executed much more often than STORE_GLOBAL, so whichever scheme wins for LOAD_GLOBAL will enjoy a multiplier effect when measuring overall performance. [and skipping the code]
I'm not looking for point-by-point answers here, I'm just pointing out things that were hard to follow so that they may get addressed in a revision.
Do you think it's PEP time yet?
When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists.
Had a hard time with this:
[...]
2. Presumably the first "the cell" in this sentence refers to a different cell than the second "the cell" intends.
No, they are the same. See __getitem__ pseudo code.
delitem is missing, but presumably straightforward.
I left it out intentionally because it adds nothing new. Maybe that was wrong -- it's important that deleting a global stores NULL in its cell.objptr but does not delete the cell from the celldict.
When a function object is created from a regular dict instead of a celldict, func_cells is a NULL pointer.
This part is regrettable, since it's Yet Another NULL check at the *top* of code using this stuff (meaning it slows the normal case, assuming that it's unusual not to get a celldict). I'm not clear on how code ends up getting created from a regular dict instead of a celldict -- is this because of stuff like "exec whatever in mydict"?
Yes, I don't want to break such code because that's been the politically correct way for ages. We do have to deprecate it to encourage people to use celldicts here. To avoid the NULL check at the top, we could stuff func_cells with empty cells and do the special-case check at the end (just before we would raise NameError). Then there still needs to be a check for STORE and DELETE, because we don't want to store into the dummy cells. Sound like a hack to assess separately later. (Another hack probably not worth it right now is to make the module's cell.cellptr point to itself if it's not shadowing a builtin cell -- then the first NULL check for cell.cellptr can be avoided in the case of finding a builtin name successful.)
- There are fallbacks in the VM for the case where the function's globals aren't a celldict, and hence func_cells is NULL. In that case, the code object's co_globals is indexed with <i> to find the name of the corresponding global and this name is used to index the function's globals dict.
Which may not succeed, so we also need another level to back off to the builtins. I'd like to pursue getting rid of the func_cells==NULL special case, even if it means constructing a celldict out of a regular dict for the duration, and feeding mutations back in to the regular dict afterwards.
The problem is that *during* the execution accessing the dict doesn't give the right results. I don't care about this case being fast (after all it's exec and if people want it faster they can switch to using a celldict). I do care about not changing corners of the semantics.
Note that a chain of 4 test+branches against NULL in "the usual case" for builtins may not be faster on average than inlining the first few useful lines of lookdict_string twice (the expected path in this routine became fat-free for 2.2):
i = hash; ep = &ep0[i]; if (ep->me_key == NULL || ep->me_key == key) return ep;
Win or lose, that's usually the end of a dict lookup. That is, I'm certain we're paying significantly more for layers of C-level function call overhead today than for what the dict implementation actually does now (in the usual cases).
This should be tried!!!
Compare this to Jeremy's scheme using dlicts:
http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobals
- My approach doesn't require global agreement on the numbering of the globals; each code object has its own numbering. This avoids the need for more global analysis,
Don't really care about that.
I do. The C code in compiler.c is already at a level of complexity that nobody understands it in its entirety! (I don't understand what Jeremy added, and Jeremy has to ask me about the original code. :-( ) Switching to the compiler.py package is unrealistic for 2.3; there's a bootstrap problem, plus it's much slower. I know that we cache the bytecode, but there are enough situations where we can't and the slowdown would kill us (imagine starting Zope for the first time from a fresh CVS checkout).
and allows adding code to a module using exec that introduces new globals without having to fall back on a less efficient scheme.
That is indeed lovely.
Forgot a <wink> there? It seems a pretty minor advantage to me. I would like to be able to compare the two schemes more before committing to any implementation. Unfortunately there's no description of Jeremy's scheme that we can compare easily (though I'm glad to see he put up his slides on the web: http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html). I guess there's so much handwaving in Jeremy's proposal about how to deal with exceptional cases that I'm uncomfortable with it. But that could be fixed. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
Do you think it's PEP time yet?
If the ideas aren't written down in an editable form while they're fresh on at least one person's mind, I'm afraid they'll just get lost. If it's a PEP, at least there will be a nagging reminder that someone once had a promising idea <wink>.
2. Presumably the first "the cell" in this sentence refers to a different cell than the second "the cell" intends.
No, they are the same. See __getitem__ pseudo code.
I persist in my delusion. Original text: When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists. Since we're doing something with "the PyObject* in the cell", surely "the cell" *must* exist. So what is the "even if the cell exists" trying to say? I believe it means to say even if the cell's cellptr is not NULL and "the cell's cellptr is not NULL" is quite different from "the cell exists".
... To avoid the NULL check at the top, we could stuff func_cells with empty cells and do the special-case check at the end (just before we would raise NameError).
That would be better -- getting cycles out of the most-frequent paths is my only goal here.
Then there still needs to be a check for STORE and DELETE, because we don't want to store into the dummy cells. Sound like a hack to assess separately later.
Another idea: a celldict could contain a "real dict" pointer, normally NULL, and pointing to a plain dict when a real dict is given. The celldict constructor would populate the cells from the realdict's contents when not NULL. Then getitem wouldn't have to do anything special (realdict==NULL and realdict!=NULL would be the same to it). setitem and delitem would propagate mutations immediately into the realdict too when non-NULL. Since mutations are almost certainly much rarer than accesses, this makes the rarer operations pay. The eval loop would always see a celldict.
(Another hack probably not worth it right now is to make the module's cell.cellptr point to itself if it's not shadowing a builtin cell -- then the first NULL check for cell.cellptr can be avoided in the case of finding a builtin name successful.)
I don't think I followed this. If, e.g., a module's "len" cell is normally {NULL, pointer to __builtin__'s "len" cell} under the original scheme, how would that change? {NULL, pointer to this very cell} wouldn't make sense. {builtin len, pointer to this very cell} would make sense, but then the pointer to self is useless -- except as a hint that we copied the value up from the builtins? But then a change to __builtin__.len wouldn't be visible to the module.
... The problem is that *during* the execution accessing the dict doesn't give the right results. I don't care about this case being fast (after all it's exec and if people want it faster they can switch to using a celldict). I do care about not changing corners of the semantics.
I expect that a write-through realdict (see above) attached to a celldict in such cases would address this, keeping the referencing code uniform and fast, and moving the special-casing into the celldict implementation for mutating operations.
i = hash; ep = &ep0[i]; if (ep->me_key == NULL || ep->me_key == key) return ep;
Win or lose, that's usually the end of a dict lookup. That is, I'm certain we're paying significantly more for layers of C-level function call overhead today than for what the dict implementation actually does now in the usual cases).
This should be tried!!!
It's less promising after more thought. The chirf snag is that "usually the end" relies on that we're usually looking for things that are there. But when looking for a builtin, it's usually not in the module's dict, where we look first. In that case, about half the time we'll find an occupied irrelevant slot in the module's dict, and then we need the rest of lookdict_string to do a (usually brief, but there's no getting away from the loop because we can't know how brief in advance) futile chase down the collision chain.
This avoids the need for more global analysis,
Don't really care about that.
I do. The C code in compiler.c is already at a level of complexity that nobody understands it in its entirety! (I don't understand what Jeremy added, and Jeremy has to ask me about the original code. :-( )
I don't care because I care about something else <wink>: it would add to the pressure to refactor this code mercilessly, and that would be a Good Thing over the long term. The current complexity isn't inherent, it's an artifact of outgrowing the original concrete-syntax-tree direct-to bytecode one-pass design. Now we've got multiple passes crawling over a now- inappropriate program representation, glued together more by "reliable accidents" <wink> than sensible design. That's all curable, and the pressures *to* cure it will continue to multiply over time (e.g., it would take a certain insanity to even think about folding pychecker-like checks into the current architecture).
Switching to the compiler.py package is unrealistic for 2.3; there's a bootstrap problem, plus it's much slower. I know that we cache the bytecode, but there are enough situations where we can't and the slowdown would kill us (imagine starting Zope for the first time from a fresh CVS checkout).
I'm a fan of fast compilation. Heck, I was upset in 1982 when Cray's compiler dropped below 100,000 lines/minute for the first time <wink>.
and allows adding code to a module using exec that introduces new globals without having to fall back on a less efficient scheme.
That is indeed lovely.
Forgot a <wink> there? It seems a pretty minor advantage to me.
No, it's lovely, not major. It's simply a good sign when the worst semantic nightmares "just work". It's also a lovely sign. Most flowers aren't terribly important either, but they are lovely.
I would like to be able to compare the two schemes more before committing to any implementation. Unfortunately there's no description of Jeremy's scheme that we can compare easily (though I'm glad to see he put up his slides on the web: http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html).
I guess there's so much handwaving in Jeremy's proposal about how to deal with exceptional cases that I'm uncomfortable with it. But that could be fixed.
I agree it needs more detail, but at the start I'm more interested in the normal cases. I'll reattach my no-holds-barred description of resolving normal-case "len" in this scheme. Perhaps Jeremy could do the same for his. Jeremy is also aiming at speeding things like math.pi (global.attribute) as a whole (not just speeding the "math" part of it). Regurgitatia: """ If I'm reading this right, then in the normal case of resolving "len" in def mylen(s): return len(s) 1. We test func_cells for NULL and find out it isn't. 2. A pointer to a cell object is read out of func_cells at a fixed (wrt this function) offset. This points to len's cell object in the module's celldict. 3. The cell object's PyObject* pointer is tested and found to be NULL. 4. The cell object's cellptr pointer is tested and found not to be NULL. This points to len's cell object in __builtin__'s celldict. 5. The cell object's cellptr's PyObject* is tested and found not to be NULL. 6. The cell object's cellptr's PyObject* is returned. """ For a module global, the same description applies, but the outcome of #3 is not-NULL and it ends there then. For global.attr, step #3 yields the global, and then attr lookup is the same as today. Jeremy, can you do the same level of detail for your scheme? Skip?
Here's a brief review of the example function. def mylen(s): return len(s) LOAD_BUILTIN 0 (len) LOAD_FAST 0 (s) CALL_FUNCTION 1 RETURN_VALUE The interpreter has a dlict for all the builtins. The details don't matter here. Let's say that len is at index 4. The function mylen has an array: func_builtin_index = [4] # an index for each builtin used in mylen The entry at index 0 of func_builtin_index is the index of len in the interpreter's builtin dlict. It is either initialized when the function is created or on first use of len. (It doesn't matter for the mechanism and there's no need to decide which is better yet.) The module has an md_globals_dirty flag. If it is true, then a global was introduced dynamically, i.e. a name binding op occurred that the compiler did not detect statically. The code object has a co_builtin_names that is like co_names except that it only contains the names of builtins used by LOAD_BUILTIN. It's there to get the correct behavior when shadowing of a builtin by a local occurs at runtime. The frame grows a bunch of pointers -- f_module from the function (which stores it instead of func_globals) f_builtin_names from the code object f_builtins from the interpreter The implementation of LOAD_BUILTIN 0 is straightforward -- in pidgin C: case LOAD_BUILTIN: if (f->f_module->md_globals_dirty) { PyObject *w = PyTuple_GET_ITEM(f->f_builtin_names); ... /* rest is just like current LOAD_GLOBAL except that is used PyDLict_GetItem() */ } else { int builtin_index = f->f_builtin_index[oparg]; PyObject *x = f->f_builtins[builtin_index]; if (x == NULL) raise NameError Py_INCREF(x); PUSH(x); } The LOAD_GLOBAL opcode ends up looking basically the same, except that it doesn't need to check md_globals_dirty. case LOAD_GLOBAL: int global_index = f->f_global_index[oparg]; PyObject *x = f->f_module->md_globals[global_index]; if (x == NULL) { check for dynamically introduced builtin } Py_INCREF(x); PUSH(x); In the x == NULL case above, we need to take extra care for a builtin that the compiler didn't expect. It's an odd case. There is a global for the module named spam that hasn't yet been assigned to in the module and there's also a builtin named spam that will be hidden once spam is bound in the module. Jeremy
[Jeremy Hylton]
Here's a brief review of the example function.
def mylen(s): return len(s)
LOAD_BUILTIN 0 (len) LOAD_FAST 0 (s) CALL_FUNCTION 1 RETURN_VALUE
The interpreter has a dlict for all the builtins. The details don't matter here.
Actually, the details are everything here <wink>.
Let's say that len is at index 4.
The function mylen has an array: func_builtin_index = [4] # an index for each builtin used in mylen
The entry at index 0 of func_builtin_index is the index of len in the interpreter's builtin dlict. It is either initialized when the function is created or on first use of len.
All clear except for the referent of "It" (the subject of the preceding sentence is "The entry at index 0", but that doesn't seem to make much sense as a referent).
(It doesn't matter for the mechanism and there's no need to decide which is better yet.)
The module has an md_globals_dirty flag. If it is true, then a global was introduced dynamically, i.e. a name binding op occurred that the compiler did not detect statically.
Once it becomes true, can md_globals_dirty ever become false again?
The code object has a co_builtin_names that is like co_names except that it only contains the names of builtins used by LOAD_BUILTIN. It's there to get the correct behavior when shadowing of a builtin by a local occurs at runtime. ^^^^^
Can that happen? Or did you mean when shadowing of a builtin by a global occurs at runtime? The LOAD_BUILTIN code below seems most consistent with the "global" rewording.
The frame grows a bunch of pointers --
f_module from the function (which stores it instead of func_globals) f_builtin_names from the code object f_builtins from the interpreter
The implementation of LOAD_BUILTIN 0 is straightforward -- in pidgin C:
case LOAD_BUILTIN: if (f->f_module->md_globals_dirty) { PyObject *w = PyTuple_GET_ITEM(f->f_builtin_names);
Presumably this is missing an ", oparg" argument.
... /* rest is just like current LOAD_GLOBAL except that is used PyDLict_GetItem() */ } else { int builtin_index = f->f_builtin_index[oparg]; PyObject *x = f->f_builtins[builtin_index]; if (x == NULL) raise NameError Py_INCREF(x); PUSH(x); }
OK, that's the gritty detail I was looking for. When it comes time to code, note that it's better to negate the test and swap the "if" branches (a not-taken branch is usually quicker than a taken branch, and you want to favor the expected case). Question: couldn't the LOAD_BUILTIN opcode use builtin_index directly as its argument (and so skip one level of indirection)? We know which builtins the interpreter supplies, and the compiler could be taught a fixed correspondence between builtin names and little integers. There are only <snort> 114 keys in __builtin__.__dict__ today, so there's plenty of room in an instruction to hold the index. A tuple of std builtin names could also be a C extern shared by everyone, eliminating the need for f_builtin_names.
The LOAD_GLOBAL opcode ends up looking basically the same, except that it doesn't need to check md_globals_dirty.
case LOAD_GLOBAL: int global_index = f->f_global_index[oparg]; PyObject *x = f->f_module->md_globals[global_index]; if (x == NULL) { check for dynamically introduced builtin } Py_INCREF(x); PUSH(x);
f_global_index wasn't mentioned before its appearance in this code block. I can guess what it is. Again I wonder whether it's possible to snip a layer of indirection (for a fixed function and fixed oparg, can f->f_global_index[oparg] change across invocations of LOAD_GLOBAL? I'm guessing "no", in which case a third of the normal-case code is burning cycles without real need).
In the x == NULL case above, we need to take extra care for a builtin that the compiler didn't expect. It's an odd case. There is a global for the module named spam
The module is named spam, or the global is named spam? I think the latter was intended.
that hasn't yet been assigned to in the module and there's also a builtin named spam that will be hidden once spam is bound in the module.
And can also be revealed again if someone reaches into the module and del's spam again, right? This looks fast, provided it works <wink>, and is along the lines of what I had in mind when I first tortured Guido with the idea of dlicts way back when. One major correction: you pronounce it "dee-likt". That's a travesty. I picked the name dlict because it's unpronounceable in any human language -- as befits an unthinkable idea <wink>.
Just a few quick questions before go back into lurcking mode: Will it still be possible to: a) install new builtins in the __builtin__ namespace and have them available in all already loaded modules right away ? b) override builtins (e.g. open()) with my own copies (e.g. to increase security) in a way that makes these new copies override the previous ones in all modules ? Also, how does the new scheme get along with the restricted execution model ? (I have a feeling that this model needs some auditing since so many new ways of accessing variables and attributes were introduced since the days of 1.5.2) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Just a few quick questions before go back into lurcking mode:
Note that I've moved my design to a new PEP, PEP 280. Tim has added his approach there too. Please read it!!!
Will it still be possible to: a) install new builtins in the __builtin__ namespace and have them available in all already loaded modules right away ? b) override builtins (e.g. open()) with my own copies (e.g. to increase security) in a way that makes these new copies override the previous ones in all modules ?
Yes, this is the whole point of this design. In the original approach, when LOAD_GLOBAL_CELL finds a NULL in the second cell, it should go back to see if the __builtins__ dict has been modified (the pseudo code doesn't have this yet). Tim's alternative also takes care of this.
Also, how does the new scheme get along with the restricted execution model ?
Yes, again.
(I have a feeling that this model needs some auditing since so many new ways of accessing variables and attributes were introduced since the days of 1.5.2)
You may be right. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Just a few quick questions before go back into lurking mode:
Note that I've moved my design to a new PEP, PEP 280. Tim has added his approach there too. Please read it!!!
Thanks for the answers; looks like I can safely go back into lurking mode :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Let's try an attribute of a module. import math def mysin(x): return math.sin(x) There are two variants of support for this that differ in the way they handle math being rebound. Say another function is: def yikes(): global math import string as math We can either check on each use of math.attr to see if math is rebound, or we can require that STORE_GLOBAL marks all the math.attr entries as invalid. I'm not sure which is better, so I'll try to describe both. Case #1: Binding operation responsible for invalidating cache. The module has a dlict for globals that contains three entries: [math, mysin, yikes]. Each is a PyObject *. The module also has a global attrs cache, where each entry is struct { int ce_initialized; /* just a flag */ PyObject **ce_ref; } cache_entry; In the case we're considering, ce_module points to math and ce_module_index is math's index in the globals dlict. It's assigned to when the module object is created and never changes. There is one entry in the global attrs cache, for math.sin. There's only one entry because the compiler only found one attribute access of a global bound by an import statement. The function mysin(x) uses LOAD_GLOBAL_ATTR 0 (math.sin). case LOAD_GLOBAL_ATTR: cache_entry *e = f->f_module->md_cache[oparg]; if (!e->ce_initialized) { /* lookup module and find it's sin attr. store pointer to module dlict entry in ce_ref. NB: cache shared by all functions. if the thing we expected to be a module isn't actually a module, handle that case here and leave initalized set to false. */ } if (*e->ce_ref == NULL) { /* raise NameError if global module isn't bound yet. raise AttributeError if module is bound, but doesn't have attr. */ } Py_INCREF(*e->ce_ref); PUSH(*e->ce_ref); To support invalidation of cache entries, we need to arrange the cache entries in a particular order and add an auxiliary data structure that maps from module globals to cache entries it must invalidation. For example, say a module use math.sin, math.cos, and math.tan. The three cache entries for the math module should be stored contiguously in the cache. cache_entry *cache[] = { math.sin entry, math.cos entry, math.tan entry, } struct { int index; /* first attr of this module in cache */ int length; /* number of attrs for this module in cache */ } invalidation_info; There is one invalidation_info for each module that has cached attributes. (And only for things that the compiler determines to be modules.) The invalidation_info for math would be {0, 3}. If a STORE_GLOBAL rebinds math, it must walk through the cache and set ce_initialized to false for each cache entry. This isn't exactly the scheme I described in the slides, where I suggested that the LOAD_GLOBAL_ATTR would check if the module binding was still valid on each use. A question from Ping pushed me back in favor of the approach that I just described. No time this weekend to describe that check-on-each-use scheme. Jeremy
"JH" == Jeremy Hylton <jeremy@alum.mit.edu> writes:
JH> Case #1: Binding operation responsible for invalidating cache. JH> The module has a dlict for globals that contains three entries: JH> [math, mysin, yikes]. Each is a PyObject *. JH> The module also has a global attrs cache, where each entry is JH> struct { JH> int ce_initialized; /* just a flag */ PyObject **ce_ref; JH> } cache_entry; JH> In the case we're considering, ce_module points to math and JH> ce_module_index is math's index in the globals dlict. It's JH> assigned to when the module object is created and never changes. Just pretend I didn't write this paragraph :-(. I was going to describe the other case first, then changed my mind. The previous paragraph describes Case #2. The text before and after this paragraph looks clear to me. Does anyone else agree? I didn't think I had done any hand waving on globals and module attributes in the slides; so I expect that I'm not a good judge of what is hand waving and what is high-level description. Jeremy
I persist in my delusion. Original text:
When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists.
Since we're doing something with "the PyObject* in the cell", surely "the cell" *must* exist. So what is the "even if the cell exists" trying to say?
It is trying to say "despite the cell's existence". See the sample code.
I believe it means to say
even if the cell's cellptr is not NULL
and "the cell's cellptr is not NULL" is quite different from "the cell exists".
No, it doesn't try to say that. But you're right that it's useful to add that the cell's cellptr is irrelevant to getitem.
Another idea: a celldict could contain a "real dict" pointer, normally NULL, and pointing to a plain dict when a real dict is given. The celldict constructor would populate the cells from the realdict's contents when not NULL. Then getitem wouldn't have to do anything special (realdict==NULL and realdict!=NULL would be the same to it). setitem and delitem would propagate mutations immediately into the realdict too when non-NULL. Since mutations are almost certainly much rarer than accesses, this makes the rarer operations pay. The eval loop would always see a celldict.
This works for propagating changes from the celldict to the real dict, but not the other way around. Example: d = {'x': 10} def set_x(x): d['x'] = x exec "...some code that calls set_x()..." in d
(Another hack probably not worth it right now is to make the module's cell.cellptr point to itself if it's not shadowing a builtin cell -- then the first NULL check for cell.cellptr can be avoided in the case of finding a builtin name successful.)
I don't think I followed this. If, e.g., a module's "len" cell is normally
{NULL, pointer to __builtin__'s "len" cell}
under the original scheme, how would that change?
{NULL, pointer to this very cell}
wouldn't make sense.
{builtin len, pointer to this very cell}
would make sense, but then the pointer to self is useless -- except as a hint that we copied the value up from the builtins? But then a change to __builtin__.len wouldn't be visible to the module.
I meant that for "len" it would not change, i.e. it would be {NULL, pointer to __builtin__'s "len" cell} but for a global "foo" it would change to {value of foo or NULL if foo is undefined, pointer to this very cell} Then if foo is defined, the code would find the value of foo in the first cell it tries, and if foo is undefined, it would find a NULL in the cell and in the cell it points to.
I do. The C code in compiler.c is already at a level of complexity that nobody understands it in its entirety! (I don't understand what Jeremy added, and Jeremy has to ask me about the original code. :-( )
I don't care because I care about something else <wink>: it would add to the pressure to refactor this code mercilessly, and that would be a Good Thing over the long term. The current complexity isn't inherent, it's an artifact of outgrowing the original concrete-syntax-tree direct-to bytecode one-pass design. Now we've got multiple passes crawling over a now- inappropriate program representation, glued together more by "reliable accidents" <wink> than sensible design. That's all curable, and the pressures *to* cure it will continue to multiply over time (e.g., it would take a certain insanity to even think about folding pychecker-like checks into the current architecture).
Actually, the concrete syntax tree was never a very good representation; it was convenient for the parser to generate that, and it was "okay" (or "good enough") to generate code from and to do anything else from. I agree that it's a good idea to start thinking about changing the parse tree representation to a proper abstract syntax tree. Maybe the normalization that the compiler.py package uses would be a good start? Except that I've never quite grasped the visitor architecture there. :-(
I agree it needs more detail, but at the start I'm more interested in the normal cases. I'll reattach my no-holds-barred description of resolving normal-case "len" in this scheme. Perhaps Jeremy could do the same for his. Jeremy is also aiming at speeding things like math.pi (global.attribute) as a whole (not just speeding the "math" part of it).
One problem with that is that it's hard to know when <global> in <global>.<attribute> is a module, and when it's something else. I guess global analysis could help -- if it's imported ("import math") it's likely a module, if it's assigned from an expression ("L = []") or a locally defined function or class, it's likely not a module. But "from X import Y" creates a mystery -- X could be a package containing a module Y, or it could be a module containing a function or class Y.
Regurgitatia:
""" If I'm reading this right, then in the normal case of resolving "len" in
def mylen(s): return len(s)
1. We test func_cells for NULL and find out it isn't.
This step could be avoided using my trick of an array of dummy cells or using your trick of a celldict containing an optional reference to a real dict, so let's skip it.
2. A pointer to a cell object is read out of func_cells at a fixed (wrt this function) offset. This points to len's cell object in the module's celldict. 3. The cell object's PyObject* pointer is tested and found to be NULL. 4. The cell object's cellptr pointer is tested and found not to be NULL.
This NULL test shouldn't be needed given my trick of linking cells that do not shadow globals to themselves.
This points to len's cell object in __builtin__'s celldict. 5. The cell object's cellptr's PyObject* is tested and found not to be NULL. 6. The cell object's cellptr's PyObject* is returned. """
For a module global, the same description applies, but the outcome of #3 is not-NULL and it ends there then.
For global.attr, step #3 yields the global, and then attr lookup is the same as today.
Jeremy, can you do the same level of detail for your scheme? Skip?
Jeremy is probably still recovering with his family from the conference. I know I got sick there and am now stuck with a horrible cold (the umpteenth one this season). --Guido van Rossum (home page: http://www.python.org/~guido/)
When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists.
Since we're doing something with "the PyObject* in the cell", surely "the cell" *must* exist. So what is the "even if the cell exists" trying to say?
It is trying to say "despite the cell's existence".
Then s/even if/even though/, and that's what it will say <wink>.
I believe it means to say
even if the cell's cellptr is not NULL
and "the cell's cellptr is not NULL" is quite different from "the cell exists".
No, it doesn't try to say that. But you're right that it's useful to add that the cell's cellptr is irrelevant to getitem.
So even though the cell exists, and even if its cellptr is non-NULL. [on a write-through dict]
This works for propagating changes from the celldict to the real dict, but not the other way around.
Fudge, that's right. Make it a 2-way write-through dict pair <wink>.
Example:
d = {'x': 10} def set_x(x): d['x'] = x exec "...some code that calls set_x()..." in d
Understood. What about this? import cheater def f(): cheater.cheat() return pachinko() print f() where cheater.py contains: def cheat(): import __builtin__ __builtin__.pachinko = lambda: 666 That works fine today, and prints 666. Under the proposed scheme, I believe: 1. The main module's globals don't get a cell for pachinko at module creation time, because pachinko isn't in the builtins at that point. 2. When the function object for f() is created, the main module's globals grow a {NULL, NULL} cell for pachinko. 3. When cheat() is called, __builtin__'s celldict grows a {lambda: 666, NULL} slot for pachinko. 4. But the main module's globals neither sees that nor has a pointer to the new cell. 5. The reference to pachinko in f's return finds {NULL, NULL} in the globals, and so raises NameError. I'm thinking more radically now, about using module dicts mapping to pairs PyObject* (the actual value) a "I got this PyObject* from the builtins" flag (the "shadow flag") Invariants: 1. value==NULL implies flag is false. 2. flag is true implies value is the value in the builtin dict Suppose that, at module creation time, the module's globals still got an entry for every then-existing builtin, but rather than point to the builtin's cell, copied the ultimate PyObject* into one of these pairs (and set the flag). Most of the other machinery in the proposal stays the same. Accessing a global (builtin or not) via LOAD_GLOBAL_CELL<i> is then very simple: the pair does or doesn't contain NULL, and that's the end of it either way. This makes the most frequent operation as fast as I can imagine it becoming (short of plugging ultimate PyObject* values directly into func_cells -- still thinking about that). How can this get out of synch? 1. Mutations of the builtin dict (creation of a new builtin, rebinding an existing builtin, del'ing an existing builtin). This has got to be exceedingly rare, and never happens in most programs: it doesn't matter how expensive this is. Each dict serving as a builtin dict could, e.g., maintain a list of weakrefs to the modules that were initialized from it. Then mutations of the dict could reach into the relevant module dicts and adjust them accordingly. The shadow flags in the module dicts' pairs let the fixup code know whether a given entry really refers to the builtin. This makes mutations of the builtins very expensive, but I'd be surprised to find a real program where it's a measurable expense. Note: It may be helpful to view this as akin to propagating changes in new-style base classes down to their descendants. 2. del'ing a module global may uncover a builtin of the same name. While not as exceedingly rare as mutations of the builtins, it's still a rare thing. Seems like it would be reasonably cheap anyway: Module delitem: raise exception if no such key # key exists raise exception if it came from the builtins (flag is set) # key exists and is a module global, not a builtin; flag is false set value to NULL if the builtins have this name as a key: copy the current builtin value set the flag 3. Module setitem: if key exists: overwrite the value and clear the flag else: add new {value, false} pair 4. Module getitem: if name isn't a key or flag is set or value is NULL: raise exception else: return value That's for non-builtin module timdicts. I expect the same code would work for the builtin module's timdict too provided it were given an empty dict as the source from which to initialize itself (then the flag component of all its pairs would start out, and remain, false, and the code above would "do the right thing" by magic, reading "the builtins" as "the timdict from which I got initialized").
... I meant that for "len" it would not change, i.e. it would be
{NULL, pointer to __builtin__'s "len" cell}
but for a global "foo" it would change to
{value of foo or NULL if foo is undefined, pointer to this very cell}
Then if foo is defined, the code would find the value of foo in the first cell it tries, and if foo is undefined, it would find a NULL in the cell and in the cell it points to.
Whereas the original scheme stored {value of foo or NULL if foo is undefined, NULL} in this case. So that's just as quick if foo is defined, but if it isn't defined as a module global, has to do an extra NULL check on the cellptr. Gotcha. OTOH, if "foo" later *becomes* defined in the builtins too, the module dict won't know that it must change its foo's cellptr. [on the compiler's architecture]
Actually, the concrete syntax tree was never a very good representation; it was convenient for the parser to generate that, and it was "okay" (or "good enough") to generate code from and to do anything else from.
I think it worked great until the local-variable optimization got added. Unfortunately, that happened shortly after the dawn of time.
I agree that it's a good idea to start thinking about changing the parse tree representation to a proper abstract syntax tree. Maybe the normalization that the compiler.py package uses would be a good start? Except that I've never quite grasped the visitor architecture there. :-(
This msg is too long already <0.7 wink>. ... [on Jeremy's global.attr scheme]
One problem with that is that it's hard to know when <global> in <global>.<attribute> is a module, and when it's something else. I guess global analysis could help -- if it's imported ("import math") it's likely a module, if it's assigned from an expression ("L = []") or a locally defined function or class, it's likely not a module. But "from X import Y" creates a mystery -- X could be a package containing a module Y, or it could be a module containing a function or class Y.
Jeremy is aware of all this (I've heard him ponder these specific points), but I don't think he has a fully fleshed out approach to all of it yet. [a rework of the "len resolution" example, incorporating Guido's comments] """ If I'm reading this right, then in the normal case of resolving "len" in def mylen(s): return len(s) 1. A pointer to a cell object is read out of func_cells at a fixed (wrt this function) offset. This points to len's cell object in the module's celldict. 2. The cell object's PyObject* pointer is tested and found to be NULL. [3'. The cell object's cellptr pointer is tested and found not to be NULL. This points to len's cell object in __builtin__'s celldict.
This NULL test shouldn't be needed given my trick of linking cells that do not shadow globals to themselves.
As above, in the presence of mutations to builtins, a global that didn't shadow a builtin at first may end up shadowing one later. Perhaps you want to punt on preserving current behavior in such cases. The variant I sketched above is intended to preserve all current behavior, while running faster in non-pathological cases. ] 3. The cell object's cellptr's PyObject* is tested and found not to be NULL. 4. The cell object's cellptr's PyObject* is returned. """ In the {PyObject*, flag} variant: 1. A pointer to a pair is read out of func_cells at a fixed (wrt this function) offset. This points to len's pair in the module's timdict. 2. The pair's PyObject* pointer is tested and found to be non-NULL. 3. The pair's PyObject* is returned.
... I know I got sick there and am now stuck with a horrible cold (the umpteenth one this season).
In empathy, I'll refrain from painting a word-picture of my neck <wink>.
Tim> If I'm reading this right, then in the normal case of resolving Tim> "len" in Tim> def mylen(s): Tim> return len(s) ... Tim> Jeremy, can you do the same level of detail for your scheme? Skip? Yeah, it's TRACK_GLOBAL 'len' LOAD_FAST <len> LOAD_FAST <s> CALL_FUNCTION 1 UNTRACK_GLOBAL 'len' RETURN_VALUE or something similar. (Stuff in <...> represent array indexes.) My scheme makes update of my local copy of __builtins__.len the responsibility of the guy who changes the global copy. Most of the time this never changes, so as the number of accesses to len increase, the average time per lookup approaches that of a simple LOAD_FAST. Skip
[Skip Montanaro, on def mylen(s): return len(s) ]
Yeah, it's
TRACK_GLOBAL 'len' LOAD_FAST <len> LOAD_FAST <s> CALL_FUNCTION 1 UNTRACK_GLOBAL 'len' RETURN_VALUE
or something similar. (Stuff in <...> represent array indexes.)
My scheme makes update of my local copy of __builtins__.len
Who is the "me" in "my"? That is, is "my local copy" attached to the frame, or to the function object, or to the module globals, or ...? Since it's accessed via LOAD_FAST, I'm assuming it's attached to the frame object.
the responsibility of the guy who changes the global copy.
Also in my variant of Guido's proposal (and the value of len is cached in the module dict there, which tracks all changes to the builtins as they occur).
Most of the time this never changes,
Right.
so as the number of accesses to len increase, the average time per lookup approaches that of a simple LOAD_FAST.
You mean number of accesses to len per function call, I think. If I do for i in xrange(1000000): print mylen("abc") I'm going to do a TRACK_GLOBAL and UNTRACK_GLOBAL thingie too for each LOAD_FAST of len, and then the average time per len lookup really has to count the average time for those guys too.
Tim> [Skip Montanaro, on Tim> def mylen(s): Tim> return len(s) Tim> ] >> Yeah, it's >> >> TRACK_GLOBAL 'len' >> LOAD_FAST <len> >> LOAD_FAST <s> >> CALL_FUNCTION 1 >> UNTRACK_GLOBAL 'len' >> RETURN_VALUE >> >> or something similar. (Stuff in <...> represent array indexes.) >> >> My scheme makes update of my local copy of __builtins__.len Tim> Who is the "me" in "my"? Sorry, should have been "the" instead of "my". TRACK_GLOBAL is responsible for making the original copy. I should have added another argument to it: TRACK_GLOBAL 'len', <len> LOAD_FAST <len> LOAD_FAST <s> CALL_FUNCTION 1 UNTRACK_GLOBAL 'len', <len> RETURN_VALUE Tim> You mean number of accesses to len per function call, I think. Yes. Tim> If I do Tim> for i in xrange(1000000): Tim> print mylen("abc") Tim> I'm going to do a TRACK_GLOBAL and UNTRACK_GLOBAL thingie too for Tim> each LOAD_FAST of len, and then the average time per len lookup Tim> really has to count the average time for those guys too. Actually, no. I originally meant to say "Ignoring the fact that my optimizer would leave this example untouched...", but deleted it while editing the message as more detail than you were asking for. Your example: def mylen(s): return len(s) doesn't access len in a loop, so it would be ignored. On the other hand: for i in xrange(1000000): print mylen("abc") would track mylen (but not xrange). Skip
>> When a function object is created from a regular dict instead of a >> celldict, func_cells is a NULL pointer. Tim> This part is regrettable, since it's Yet Another NULL check at the Tim> *top* of code using this stuff (meaning it slows the normal case, Tim> assuming that it's unusual not to get a celldict). I'm not clear Tim> on how code ends up getting created from a regular dict instead of Tim> a celldict -- is this because of stuff like "exec whatever in Tim> mydict"? I'm still working my way through this thread, so forgive me if this has been hashed out already. It seems to me that the correct thing to do is to convert plain dicts to celldicts when creating functions. Besides, where are functions going to get created that are outside of your (PyhonLabs) control? Skip
[Skip Montanaro]
I'm still working my way through this thread, so forgive me if this has been hashed out already. It seems to me that the correct thing to do is to convert plain dicts to celldicts when creating functions.
There's the problem of object identity: it's possible for exec'ed code to mutate the original dict while the exec'ed code is running, and Guido gave an example where that can matter. I had originally suggested building a celldict that *contained* the original dict, reflecting mutations from the former to the latter as they happened. Mutations in the other direction go unnoticed, though. If the binary layouts are compatible enough, it may suffice to replace the dict's type pointer for the duration. Even then, the exec'ed code may get tripped up via testing (directly or indirectly) the type of the original dict (I suppose it could lie about its type ...).
Besides, where are functions going to get created that are outside of your (PyhonLabs) control?
They aren't, but eval and exec and execfile allow users to pass in plain dicts to be used for locals and/or globals.
Tim> There's the problem of object identity: it's possible for exec'ed Tim> code to mutate the original dict while the exec'ed code is running, Tim> and Guido gave an example where that can matter. In the face of exec statements or calls to execfile can't the compiler just generate the usual LOAD_NAME fallback instead of the new-fangled LOAD_GLOBAL opcode? Skip
[Skip]
In the face of exec statements or calls to execfile can't the compiler just generate the usual LOAD_NAME fallback instead of the new- fangled LOAD_GLOBAL opcode?
Note that exec doesn't have to be passed a string: you can pass it a compiled code object just as well. The compiler can't guess how a code object will be used at the time it's compiled. In theory there would be nothing to stop exec from rewriting the bytecode in a compiled code object passed to it, but I doubt we could get Guido to buy that trick until he first buys rewriting bytecode to set debugger breakpoints <wink>.
[Skip]
In the face of exec statements or calls to execfile can't the compiler just generate the usual LOAD_NAME fallback instead of the new- fangled LOAD_GLOBAL opcode?
[Tim]
Note that exec doesn't have to be passed a string: you can pass it a compiled code object just as well. The compiler can't guess how a code object will be used at the time it's compiled. In theory there would be nothing to stop exec from rewriting the bytecode in a compiled code object passed to it, but I doubt we could get Guido to buy that trick until he first buys rewriting bytecode to set debugger breakpoints <wink>.
Arg. So much for that idea. (Although I think the mutable bytecode idea *is* the right idea for setting breakpoints after all.) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
[Skip]
In the face of exec statements or calls to execfile can't the compiler just generate the usual LOAD_NAME fallback instead of the new- fangled LOAD_GLOBAL opcode?
[Tim]
Note that exec doesn't have to be passed a string: you can pass it a compiled code object just as well. The compiler can't guess how a code object will be used at the time it's compiled. In theory there would be nothing to stop exec from rewriting the bytecode in a compiled code object passed to it, but I doubt we could get Guido to buy that trick until he first buys rewriting bytecode to set debugger breakpoints <wink>.
Arg. So much for that idea. (Although I think the mutable bytecode idea *is* the right idea for setting breakpoints after all.)
Let me play stupid for a sec: how does a compiled code object get created? Is Tim saying that one can pass foo.bar to exec, where bar() is a function in module foo? If not, why can't we force compile() to generate the slower code? Alternatively, can we change the semantics of exec to require the use of compile() to generate code objects? (compile() on an existing code object would do an explicit rewrite of the bytecode.) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista We must not let the evil of a few trample the freedoms of the many.
[Aahz]
Let me play stupid for a sec: how does a compiled code object get created?
Explicitly via passing strings to compile/exec/eval or via execfile, or implicitly due to the normal operation of class, def and lambda statements, and the interactive prompt.
Is Tim saying that one can pass foo.bar to exec, where bar() is a function in module foo?
No, foo.bar is a function object, meaning basically that it's a code object bound to a specific name and a specifc bag of globals, and whose default argument values (if any) have been computed and frozen based on those globals. You can pass foo.bar.func_code to exec, though (that's the raw code object). Note that marshal can't handle function objects, but can handle code objects, and some people make extremely heavy use of extracting code objects for marshaling, then later unmarshaling and exec'ing them. When I'm tempted to exec, I'm more likely to use compile() in a separate step (to get better control over errors) and exec the resulting code object.
... Alternatively, can we change the semantics of exec to require the use of compile() to generate code objects? (compile() on an existing code object would do an explicit rewrite of the bytecode.)
I didn't follow this, but am not sure it would help if I did <wink>.
[Skip]
In the face of exec statements or calls to execfile can't the compiler just generate the usual LOAD_NAME fallback instead of the new-fangled LOAD_GLOBAL opcode?
Very good! --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
- Let a cell be a really simple PyObject, containing a PyObject pointer and a cell pointer. Both pointers may be NULL. (It may have to be called PyGlobalCell since I believe there's already a PyCell object.) (Maybe it doesn't even have to be an object -- it could just be a tiny struct.)
- Let a celldict be a mapping that is implemented using a dict of cells. When you use its getitem method, the PyObject * in the cell is dereferenced, and if a NULL is found, getitem raises KeyError even if the cell exists. Using setitem to add a new value creates a new cell and stores the value there; using setitem to change the value for an existing key stores the value in the existing cell for that key. There's a separate API to access the cells.
The following is totally unimportant, but I feel compelled to share: I implemented this once, long ago, for Python 1.5-ish, I believe. I got it to the point where it was only 15% slower than ordinary Python, then abandoned it. ;) In my implementation, "cells" were real first-class objects, and "celldict" was a copy-and-hack version of dictionary. I forget how the rest worked. Anyway, this is all very exciting to me. :) ## Jason Orendorff http://www.jorendorff.com/
[Jason Orendorff]
The following is totally unimportant, but I feel compelled to share:
I implemented this once, long ago, for Python 1.5-ish, I believe. I got it to the point where it was only 15% slower than ordinary Python, then abandoned it. ;) In my implementation, "cells" were real first-class objects,
That shouldn't matter to speed via any first-order effect, unless you also used accessor functions instead of direct reference to get at the data members.
and "celldict" was a copy-and-hack version of dictionary.
Hmm.
I forget how the rest worked.
Anyway, this is all very exciting to me. :)
Don't worry -- it will run much faster if Guido codes it. One key difference is that Guido will run each cell in its own thread <wink>.
Guido> Inspired by talks by Jeremy and Skip on DevDay, here's a Guido> different idea for speeding up access to globals. It retain Guido> semantics but (like Jeremy's proposal) changes the type of a Guido> module's __dict__. Just to see if I have a correct mental model of what Guido proposed, I drew a picture: http://manatee.mojam.com/~skip/python/celldict.png The cells are the small blank boxes. I guess the celldict would be the stuff I labelled "module dict". The "func cells" would be an array like fastlocals, but would refer to cells in the module's dict. I'm not clear where/how builtins are accessed though. Is that what the extra indirection is, or are builtins incorporated into the module dict somehow? If anyone wants to correct my picture, the Dia diagram is at http://manatee.mojam.com/~skip/python/celldict Skip
[Skip Montanaro]
Just to see if I have a correct mental model of what Guido proposed, I drew a picture:
http://manatee.mojam.com/~skip/python/celldict.png
The cells are the small blank boxes. I guess the celldict would be the stuff I labelled "module dict". The "func cells" would be an array like fastlocals, but would refer to cells in the module's dict.
Yup, except that fastlocals are part of a frame, not part of a function object. Guido didn't make a big deal about this, but it's key to efficiency: the expense of setting up func_cells is *not* incurred on a per-call basis, it's done once when a function object is created (MAKE_FUNCTION), then reused across all calls to that function object.
I'm not clear where/how builtins are accessed though.
__builtin__ is just another module, and also has a celldict for a __dict__. The empty squares in your diagram (the "bottom half" of your cells) sometimes point to cells in __builtin__'s celldict. They remain empty (NULL) in __builtin__'s celldict, though.
Is that what the extra indirection is, or are builtins incorporated into the module dict somehow?
In Guido's proposal, module celldicts sometimes point to builtin's cells. It's set up so that *all* names of builtins get an entry in the module's dict, even names that aren't referenced in the module (this avoids global analysis). Their initial entries look like: "len": {NULL, pointer to the "len" cell in the builtins} Setting "len" as a module global (if you ever do that) overwrites the NULL. Then later del'ing "len" again (if you ever do that) restores the NULL. For *most* purposes, a cell with a NULL first pointer acts as if it didn't exist. It's only the eval loop that understands the "deep structure". In the variant I sketched today, there are no cross-dict pointers, and the initial entries look like "len": {the actual value of "len" from builtins, true} instead. Then mutating the builtins requires reaching back into modules and updating their timdicts. In return, access code is simpler+faster, and there aren't semantic changes (compared to today) if the builtins mutate *after* a module's dict is initially populated (Guido's scheme appears vulnerable here in at least two ways).
All right -- i have attempted to diagram a slightly more interesting example, using my interpretation of Guido's scheme. http://lfw.org/repo/cells.gif http://lfw.org/repo/cells-big.gif for a bigger image http://lfw.org/repo/cells.ai for the source file The diagram is supposed to represent the state of things after "import spam", where spam.py contains import eggs i = -2 max = 3 def foo(n): y = abs(i) + max return eggs.ham(y + n) How does it look? Guido, is it anything like what you have in mind? A couple of observations so far: 1. There are going to be lots of global-cell objects. Perhaps they should get their own allocator and free list. 2. Maybe we don't have to change the module dict type. We could just use regular dictionaries, with the special case that if retrieving the value yields a cell object, we then do the objptr/cellptr dance to find the value. (The cell objects have to live outside the dictionaries anyway, since we don't want to lose them on a rehashing.) 3. Could we change the name, please? It would really suck to have two kinds of things called "cell objects" in the Python core. 4. I recall Tim asked something about the cellptr-points-to-itself trick. Here's what i make of it -- it saves a branch: instead of PyObject* cell_get(PyGlobalCell* c) { if (c->cell_objptr) return c->cell_objptr; if (c->cell_cellptr) return c->cell_cellptr->cell_objptr; } it's PyObject* cell_get(PyGlobalCell* c) { if (c->cell_objptr) return c->cell_objptr; return c->cell_cellptr->cell_objptr; } This makes no difference when c->cell_objptr is filled, but it saves one check when c->cell_objptr is NULL in a non-shadowed variable (e.g. after "del x"). I believe that's the only case in which it matters, and it seems fairly rare to me that a module function will attempt to access a variable that's been deleted from the module. Because the module can't know what new variables might be introduced into __builtin__ after the module has been loaded, a failed lookup must finally fall back to a lookup in __builtin__. Given that, it seems like a good idea to set c->cell_cellptr = c when c->cell_objptr is set (for both shadowed and non-shadowed variables). In my picture, this would change the cell that spam.max points to, so that it points to itself instead of __builtin__.max's cell. That is: PyObject* cell_set(PyGlobalCell* c, PyObject* v) { c->cell_objptr = v; c->cell_cellptr = c; } This simplifies things further: PyObject* cell_get(PyGlobalCell* c) { return c->cell_cellptr->cell_objptr; } This buys us no branches, which might be a really good thing on today's speculative execution styles. I know i'm a few messages behind on the discussion -- i'll do some reading to catch up before i say any more. But i hope the diagram is somewhat helpful, anyway. -- ?!ng
On Mon, 11 Feb 2002, Ka-Ping Yee wrote:
This simplifies things further:
PyObject* cell_get(PyGlobalCell* c) { return c->cell_cellptr->cell_objptr; }
I forgot to mention that this would also add loopback cellptrs for the two cells pointed to by __builtin__.abs and __builtin__.max. But hey... in that case the cellptr is always two steps away from the object. So why not just use PyObject**s instead of cells? dict -> ptr -> ptr -> object (Or, if we want to maintain backward compatibility with existing dictionaries, let a cell be an object, so we can check its type, and have it contain just one pointer instead of two?) Am i out to lunch? -- ?!ng
Ping> But hey... in that case the cellptr is always two steps away from Ping> the object. So why not just use PyObject**s instead of cells? I think it's because they aren't objects. You need to make the indirection explicit so that when some code does the equivalent of module.abs it realizes it needs to follow the chain. Thanks for the great diagram, btw. I knew if I did something feeble it would get rewritten correctly. Skip
On Mon, 11 Feb 2002, Ka-Ping Yee wrote:
This simplifies things further:
PyObject* cell_get(PyGlobalCell* c) { return c->cell_cellptr->cell_objptr; }
I forgot to mention that this would also add loopback cellptrs for the two cells pointed to by __builtin__.abs and __builtin__.max.
But hey... in that case the cellptr is always two steps away from the object. So why not just use PyObject**s instead of cells?
dict -> ptr -> ptr -> object
(Or, if we want to maintain backward compatibility with existing dictionaries, let a cell be an object, so we can check its type, and have it contain just one pointer instead of two?)
Am i out to lunch?
I think so. Think of max in the example used for your diagram (thanks for that BTW!). The first cell for it contains 3; the second cell for it contains the built-in function 'max'. A double dereference would get the wrong value. Or did I misread your suggestion? --Guido van Rossum (home page: http://www.python.org/~guido/)
Ping> All right -- i have attempted to diagram a slightly more Ping> interesting example, using my interpretation of Guido's scheme. Very nice. One case I would like to see covered is that of a global that is deleted. Something like: import eggs i = -2 max = 3 j = 4 def foo(n): y = abs(i) + max return eggs.ham(y + n) del j I presume there would still be an entry in spam's module dict with a NULL objptr. The whole think makes sense to me if it avoids the possible two PyDict_GetItem calls in the LOAD_GLOBAL opcode. As I understand it, if accessed inside a function, LOAD_GLOBAL could be implemented something like this: case LOAD_GLOBAL: cell = func_cells[oparg]; if (cell.objptr) x = cell->objptr; else x = cell->cellptr->objptr; if (x == NULL) { ... error recovery ... break; } Py_INCREF(x); continue; This looks a lot better to me (no complex function calls). What happens in the module's top-level code where there is presumably no func_cells array? Do we simply have two different opcodes, one for use at the global level and one for use in functions? Skip
Very nice. One case I would like to see covered is that of a global that is deleted. Something like:
import eggs
i = -2 max = 3 j = 4
def foo(n): y = abs(i) + max return eggs.ham(y + n)
del j
I presume there would still be an entry in spam's module dict with a NULL objptr.
Yes.
The whole think makes sense to me if it avoids the possible two PyDict_GetItem calls in the LOAD_GLOBAL opcode. As I understand it, if accessed inside a function, LOAD_GLOBAL could be implemented something like this:
case LOAD_GLOBAL:
Surely you meant LOAD_GLOBAL_CELL.
cell = func_cells[oparg]; if (cell.objptr) x = cell->objptr; else x = cell->cellptr->objptr; if (x == NULL) { ... error recovery ... break; } Py_INCREF(x); continue;
This looks a lot better to me (no complex function calls).
Here's my version: case LOAD_GLOBAL_CELL: cell = func_cells[oparg]; x = cell->objptr; if (x == NULL) { x = cell->cellptr->objptr; if (x == NULL) { ... error recovery ... break; } } Py_INCREF(x); continue;
What happens in the module's top-level code where there is presumably no func_cells array? Do we simply have two different opcodes, one for use at the global level and one for use in functions?
It could use LOAD_GLOBAL which should use PyMapping_GetItem on the globals dict. Or maybe even LOAD_NAME which should do the same. But we could also somehow create a func_cells array (hm, it would have to be called differently then I suppose). (I've added these to the FAQs in PEP 280 too.) --Guido van Rossum (home page: http://www.python.org/~guido/)
[Skip Montanaro]
... The whole think makes sense to me if it avoids the possible two PyDict_GetItem calls in the LOAD_GLOBAL opcode. As I understand it, if accessed inside a function, LOAD_GLOBAL could be implemented something like this:
case LOAD_GLOBAL: cell = func_cells[oparg]; if (cell.objptr) x = cell->objptr; else x = cell->cellptr->objptr; if (x == NULL) { ... error recovery ... break; } Py_INCREF(x); continue;
This looks a lot better to me (no complex function calls).
Something much like that. Guido added code to the PEP (280). My suggested modifications reduce it to: case LOAD_GLOBAL_CELL: cell = func_cells[oparg]; x = cell->objptr; if (x != NULL) { Py_INCREF(x); continue; } ... error recovery ... break; Another difference is hiding in the "... error recovery ..." elisions. In Guido's scheme, this must also include code to deal with the possibility that a global went away and thereby uncovered a builtin that popped into existence after the module globals were initialized. Then it's still a non-error case, but the cell->cellptr has gotten out of synch with reality. In the variation, the caches are never allowed to get out of synch, so "... error recovery .." there should really be "... error reporting ...": you can't there in the variant unless NameError is certain. Hmm: We *all* seem to be missing a PUSH(x), so all of our schemes are dead wrong <wink>. Speaking of which, why does LOAD_FAST waste time checking against NULL twice?! case LOAD_FAST: x = GETLOCAL(oparg); if (x == NULL) { format_exc_check_arg( PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg) ); break; } Py_INCREF(x); PUSH(x); if (x != NULL) continue; break; I'll fix that ...
[Tim]
Speaking of which, why does LOAD_FAST waste time checking against NULL twice?!
[Neil Schemenauer]
If you would have approved my patch it would be fixed already.
Heh. If you had entered the patch at priority 9, I might have gotten to it by this summer. At priority 3, we're talking years <wink/sigh>. I boosted it to 6. Note that the tiny patch I checked in also rearranged the code so that the mormal case became the fall-through case: if (normal) do normal stuff else do exceptional stuff Most dumb compilers on platforms that care use a "forward branches probably aren't taken, backward branches probably are" heuristic for setting branch-prediction hints in the machine code; and on platforms that don't care it's usually faster to fall through than to change the program counter anyway.
one-small-banana-left-ly y'rs Neil
is-that-an-american-or-canadian-banana?-ly y'rs - tim
Tim Peters wrote:
if (normal) do normal stuff else do exceptional stuff
Most dumb compilers on platforms that care use a "forward branches probably aren't taken, backward branches probably are" heuristic for setting branch-prediction hints in the machine code; and on platforms that don't care it's usually faster to fall through than to change the program counter anyway.
I seem to remember someone saying that GCC generated better code for: if (exceptional) { do exceptional things break / return / goto } do normal things Is GCC in the dumb category? Also, the Linux is starting to use this set of macros more often: /* Somewhere in the middle of the GCC 2.96 development cycle, we * implemented a mechanism by which the user can annotate likely * branch directions and expect the blocks to be reordered * appropriately. Define __builtin_expect to nothing for earlier * compilers. */ #if __GNUC__ == 2 && __GNUC_MINOR__ < 96 #define __builtin_expect(x, expected_value) (x) #endif #define likely(x) __builtin_expect((x),1) #define unlikely(x) __builtin_expect((x),0) For example: if (likely(normal)) do normal stuff else do exceptional stuff I don't have GCC >= 2.96 otherwise I would have tried sprinkling some of those macros in ceval and testing the effect. Neil
[Neil Schemenauer]
I seem to remember someone saying that GCC generated better code for:
if (exceptional) { do exceptional things break / return / goto } do normal things
Is GCC in the dumb category?
Yes, any compiler that doesn't do branch prediction based on *semantic* analysis is dirt dumb. A simple example of semantic prediction is "comparing a pointer to NULL is probably going to yield false". Ditto comparing a number for equality with 0. I'd like to see a reference for the pattern above; it goes against the very common "forward branches usually aren't taken" heuristic. Note that Vladimir applied that gimmick to an extreme in obmalloc.c's malloc() function.
Also, the Linux is starting to use this set of macros more often: ["likely" and "unlikely"]
They're late to the party. Cray had a "percent true" directive 20 years ago, allowing for 48 bits of precision in specifying how (un)likely <wink>.
... I don't have GCC >= 2.96 otherwise I would have tried sprinkling some of those macros in ceval and testing the effect.
Maybe more interesting: One of the folks at Zope Corp reported getting significant speedups by using some gcc option that can feed real-life branch histories back into the compiler. Less work and less error-prone than guessing annotations.
Tim Peters wrote:
Maybe more interesting: One of the folks at Zope Corp reported getting significant speedups by using some gcc option that can feed real-life branch histories back into the compiler. Less work and less error-prone than guessing annotations.
Slightly OT: Has anyone tried compiling Python w/ the Intel C++ compiler? --david
Tim> Maybe more interesting: One of the folks at Zope Corp reported Tim> getting significant speedups by using some gcc option that can feed Tim> real-life branch histories back into the compiler. Less work and Tim> less error-prone than guessing annotations. That would be -fprofile-args and -fbranch-probabilities: -fprofile-arcs also makes it possible to estimate branch probabilities, and to calculate basic block execution counts. In general, basic block execution counts do not give enough information to estimate all branch probabilities. When the compiled program exits, it saves the arc execution counts to a file called sourcename.da. Use the compiler option -fbranch-probabilities when recompiling, to optimize using estimated branch probabilities. I fiddled with a bunch of gcc options a few months ago. I finally settled on -O3 -minline-all-stringops -fomit-frame-pointer Skip
All right -- i have attempted to diagram a slightly more interesting example, using my interpretation of Guido's scheme. [...] How does it look? Guido, is it anything like what you have in mind?
Yes, exactly. I've added pointers to your images to PEP 280. Maybe you can also create a diagram for Tim's "more aggressive" scheme?
A couple of observations so far:
1. There are going to be lots of global-cell objects. Perhaps they should get their own allocator and free list.
Yes.
2. Maybe we don't have to change the module dict type. We could just use regular dictionaries, with the special case that if retrieving the value yields a cell object, we then do the objptr/cellptr dance to find the value. (The cell objects have to live outside the dictionaries anyway, since we don't want to lose them on a rehashing.)
And who would do the special dance? If PyDict_GetItem, it would add an extra test to code whose speed is critical in lots of other cases (plus it would be impossible to create a dictionary containing cells without having unwanted special magic). If in a wrapper, then <module>.__dict__[<key>] would return a surprise cell instead of a value.
3. Could we change the name, please? It would really suck to have two kinds of things called "cell objects" in the Python core.
Agreed. Or we could add a cellptr to the existing cell objects; or maybe a scheme could be devised that wouldn't need a cell to have a cellptr, and then we could use the existing cell objects unchanged.
4. I recall Tim asked something about the cellptr-points-to-itself trick. Here's what i make of it -- it saves a branch: instead of
PyObject* cell_get(PyGlobalCell* c) { if (c->cell_objptr) return c->cell_objptr; if (c->cell_cellptr) return c->cell_cellptr->cell_objptr; }
it's
PyObject* cell_get(PyGlobalCell* c) { if (c->cell_objptr) return c->cell_objptr; return c->cell_cellptr->cell_objptr; }
That's what my second "additional idea" in PEP 280 proposes: | - Make c.cellptr equal to c when a cell is created, so that | LOAD_GLOBAL_CELL can always dereference c.cellptr without a NULL | check.
This makes no difference when c->cell_objptr is filled, but it saves one check when c->cell_objptr is NULL in a non-shadowed variable (e.g. after "del x"). I believe that's the only case in which it matters, and it seems fairly rare to me that a module function will attempt to access a variable that's been deleted from the module.
Agreed. When x is not defined, it doesn't matter how much extra code we execute as long as we don't dereference NULL. :-)
Because the module can't know what new variables might be introduced into __builtin__ after the module has been loaded, a failed lookup must finally fall back to a lookup in __builtin__. Given that, it seems like a good idea to set c->cell_cellptr = c when c->cell_objptr is set (for both shadowed and non-shadowed variables). In my picture, this would change the cell that spam.max points to, so that it points to itself instead of __builtin__.max's cell. That is:
PyObject* cell_set(PyGlobalCell* c, PyObject* v) { c->cell_objptr = v; c->cell_cellptr = c; }
But now you'd have to work harder when you delete the global again (i.e. in cell_delete()); the shadowed built-in must be restored.
This simplifies things further:
PyObject* cell_get(PyGlobalCell* c) { return c->cell_cellptr->cell_objptr; }
This buys us no branches, which might be a really good thing on today's speculative execution styles.
Good idea! (And before I *did* misread your followup, because I hadn't fully digested this msg. I think you're right that we might be able to use just a PyObject **; but I haven't fully digested Tim's more aggressive idea.) --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... I think you're right that we might be able to use just a PyObject **; but I haven't fully digested Tim's more aggressive idea.)
The overwhelming thrust of Tim's variant is to reduce the (by far) most frequent namespace operation to this: case LOAD_GLOBAL_CELL: cell = func_cells[oparg]; x = cell->objptr; /* note: not two levels, just one */ if (x != NULL) { Py_INCREF(x); continue; } ... error recovery ... break; *Everything* else follows from that; it's really a quite minor variant of the original proposal, consisting mostly of changes to spelling details due to having a different kind of cell. The sole big change is requiring that mutations to builtins propagate at once to their cached values in module celldicts. I believe Jeremy's scheme *could* do better than this for builtins, but not the way it's currently set up (I don't see why we can't define a fixed bijection between the standard builtin names and a range of contiguous little integers, and use that fixed bijection everywhere; the compiler could exploit this global (in the cross-module sense) bijection directly for LOAD_BUILTIN offsets, eliminating all the indirection for standard builtins, and eliminating the code-object-specific vectors of referenced builtin names; note that I don't care about speeding access to builtins with non-standard names -- fine by me if they're handled via LOAD_GLOBAL instead, and fall into its "oops! it's not a module global after all" case).
[Ping, suggesting to always dereference the cell twice]
But hey... in that case the cellptr is always two steps away from the object. So why not just use PyObject**s instead of cells?
[Tim, sketching a scheme that always dereferences the cell once]
The sole big change is requiring that mutations to builtins propagate at once to their cached values in module celldicts.
Let's combine these ideas. Suppose there's a vector of pointers to pointers to objects, indexed by some index calculated by the compiler. Then the fast track in LOAD_GLOBAL_CELL could look like this: case LOAD_GLOBAL_CELL: x = *globals_vector[oparg]; if (x != NULL) { Py_INCREF(x); continue; } ... handle uncommon cases and errors here ... Here, globals_vector[i] is usually the address of the me_value slot of a PyDictEntry in either the globals dict or the builtins dict. These are subclasses of dict that trap assignment, deletion, and rehashing. There's a special C-global variable PyObject *unfilled = NULL; whose contents is always NULL; when globals_vector is initalized, every element of it is set to &unfilled. The code to handle uncommon cases and errors does a "lookdict" operation on the globals dict using the name of the global variable, which it gets from (e.g.) globals_names[oparg]. This requires some opening up of the dict implementation; lookdict is an internal routine that returns a PyDictEntry *, call it e. If e->me_value != NULL, we set globals_vector[oparg] to &e->me_value, and we're done. Otherwise, we do another lookdict on the builtins dict, and again if e->me_value != NULL, set globals_vector[oparg] to &e->me_value. Otherwise, we raise a NameError. Now we need to take care of a number of additional special cases that could invalidate the pointers we're collecting in globals_vector. The following things may invalidate those pointers: - globals_vector[i] points to a builtin, and a global with the same name is created - globals_vector[i] points to either a builtin or a global, and the dict into whose hashtable it points is rehashed (as the result of adding an item) - globals_vector[i] points to a builtin or global, which is deleted - globals_vector[i] points to a builtin, and the special global named __builtins__ is assigned to (switching to a different builtins dict) To deal with all these cases, our dict subclass keeps a list of weak references. The builtins dict has weak references pointing to all globals dicts that shadow this builtins dict (because of rexec there can be multiple builtins dicts); each globals dict has weak references pointing to all the globals_vector structures that reference it. On a rehash, all entries in each affected globals_vector are reset to &unfilled. The uncommon case handling code will gradually populate them again. On assignment or deletion it might pay off to be a little more careful and only invalidate the entry in the globals_vector corrsponding to the affected name. (In particular, assignment to a global that's already set, and deletion of a global that doesn't shadow a built-in, should probably be handled somewhat efficiently.) The globals_vector structure should contain a pointer to the corresponding globals_names array, and it should also contain a reference to the globals dict into whose hashtable it may point, to keep it alive. So it should probably be an object that contains a vector of pointers to pointers in addition to some other stuff. The globals_vector may be shared by all code objects compiled together; this makes it similar to the dlict. But the overflow handling is quite different, and by pointing directly into the hash table it is possible to handle all globals and builtins uniformly. I expect that the implementation won't be particularly hard; the lookdict operation already exists and we can easily subclass dict to trap the setitem and delitem operations. We will have to be careful not to use PyDict_SetItem() and PyDict_DelItem() on these subclasses, but that should be easy: I think that the only offenders here are STORE_NAME and friends, and these are exactly the operations that we're going to change anyway. (STORE_NAME or STORE_GLOBAL will become a bit slower, because of the check whether it needs to update any globals_vector structures; but that's OK since we're speeding up the corresponding LOAD operation quite a bit.) (I'd add this to PEP 280 but I'll wait for Tim to shoot holes in it first. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... (I'd add this to PEP 280 but I'll wait for Tim to shoot holes in it first. :-)
At first sight this scheme made sense and was very appealing. Alas, I've been staring at the computer non-stop for 11 hours, my eyes are losing focus, and I just remembered I forgot to eat today. IOW, you'll have to wait for Tuesday to get shot <wink>. For now, I really liked that the indirection vector fills in adaptively, to speed accesses that are actually getting made. I don't know that that's an objective advantage, but I loved the image.
[Ping]
1. There are going to be lots of global-cell objects. Perhaps they should get their own allocator and free list.
[Guido]
Yes.
No no no no no, and 5*-1 beats a +1 even from the BDFL <wink>. Vanilla pymalloc is perfect for this: many small objects. A custom free list for cells is a waste of code, because a cell never goes away until the module does: cells will not "churn". We'll get a lot of them, but most of them will stay alive until the program ends, so the tiny performance gain you may be able to get from a thoroughly specialized free list "in theory" will never be realized in practice.
Tim Peters writes:
Vanilla pymalloc is perfect for this: many small objects. A custom free list for cells is a waste of code, because a cell never goes away until the module does: cells will not "churn". We'll get a lot of them, but most of them will stay alive until the program ends, so the tiny performance gain you may be able to get from a thoroughly specialized free list "in theory" will never be realized in practice.
Have we become convinced that these cells need to be Python objects? I must have missed that. As long as we can keep them simple structures, we should be able to avoid individual allocations for them. It seems we have a fixed number of cells for both module objects and function objects (regardless of whether they are part of the new celldict or the containing function or module), so they can be allocated as an array rather than individually. So, I must be missing something. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
[Fred]
Have we become convinced that these cells need to be Python objects?
No, it's just easier that way. The existing dict code maps PyObject* to PyObject*, so we'd have to copy and fiddle *all* of the dict code if a celldict wants to map to anything other than a PyObject*.
I must have missed that. As long as we can keep them simple structures, we should be able to avoid individual allocations for them. It seems we have a fixed number of cells for both module objects and function objects (regardless of whether they are part of the new celldict or the containing function or module), so they can be allocated as an array rather than individually.
So, I must be missing something.
cells don't live in function objects; a function object only has a vector of pointers *to* cells, and that is indeed a contiguous, fixed-size array. cells are the values in celldicts, and that's the only place they appear, and celldicts can grow dynamically (import fred; fred.brandnew = 1).
participants (12)
-
aahz@rahul.net -
David Ascher -
Fred L. Drake, Jr. -
Guido van Rossum -
Jason Orendorff -
Jeremy Hylton -
Jeremy Hylton -
Ka-Ping Yee -
M.-A. Lemburg -
Neil Schemenauer -
Skip Montanaro -
Tim Peters