I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote:
Steven Elliott wrote:
One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to.
As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code.
What I have in mind may be close to what you are suggesting above. My thought now is that builtins are a set of tokens that typically, but don't necessarily, point to the same objects in all modules. Such tokens, which I'll refer to as "global tokens", can be roughly broken into two sets: 1) Global tokens that typically point to the same object in all modules. 2) Global tokens that that are likely to point to the different objects (or be undefined) in different modules. Set 1) is pretty much the the builtins. "True" and "len" are likely to point to the same objects in all modules, but not necessarily. Set 2) might be things like "os" and "sys" which are often defined (imported) in modules, but not necessarily. Access to the globals of a module, including the current module, is done with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For each of these opcodes the following snippet of code from ceval.c (for LOAD_GLOBAL) is relevant to this discussion: /* This is the un-inlined version of the code above */ x = PyDict_GetItem(f->f_globals, w); if (x == NULL) { x = PyDict_GetItem(f->f_builtins, w); if (x == NULL) { load_global_error: format_exc_check_arg( PyExc_NameError, GLOBAL_NAME_ERROR_MSG, w); break; } } So, to avoid the hash table lookups above maybe the global tokens could be assigned an index value that is fixed for any given version of the interpreter and that is the same for all modules (that "True" is always index 7, "len" is always index 3, etc.) Once a set of indexes have been determined a new opcode, that I'll call "LOAD_GTOKEN", could be created that avoids the hash table lookup by functioning in a way that is similar to LOAD_FAST (pull a local variable value out of an array). For example, static references to "True" could always be compiled to LOAD_GTOKEN 7 (True) As to set 1) and set 2) that I mentioned above - there is only a need to distinguish between the two sets if a copy-on-write mechanism is used. That way global tokens that are likely to have their value changed (group 2) ) can all be together in one group so that only that group needs to be copied when one of the global tokens is written to. For example code such as: True = 1 print True would be compiled into something like: 1 LOAD_CONST 1 (1) STORE_GTOKEN1 7 (True) 2 LOAD_GTOKEN1 7 (True) PRINT_ITEM PRINT_NEWLINE Note that "1" has been appended to "STORE_GTOKEN" to indicate that group 1) is being worked with. The store command will copy the array of pointers once, the first time it is called. Just as a new opcode is needed for LOAD_GLOBAL one would be needed for LOAD_ATTR. Perhaps "LOAD_ATOKEN" would work. For example: amodule.len = my_len print amodule.len would be compiled into something like: 1 LOAD_GLOBAL 0 (my_len) LOAD_GLOBAL 1 (amodule) STORE_ATOKEN1 3 (len) 2 LOAD_GLOBAL 1 (amodule) LOAD_ATOKEN1 3 (len) PRINT_ITEM PRINT_NEWLINE LOAD_CONST 0 (None) RETURN_VALUE Note that it looks almost identical to the code that is currently generated, but the oparg "3" shown for the "LOAD_ATOKEN1" above indexes into an array (like LOAD_FAST) to get at the attribute directly whereas the oparg that would be shown for LOAD_ATTR is an index into an array of constants/strings which is then used to retrieve the attribute from the module's global hash table.
That's great, but I'm curious if additional gains can be made be focusing just on builtins.
As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module.
I now think that it best not to think of builtins as being a special case. What really matters is that if all modules agree on a way of indexing global tokens (possible since they are all compiled with the same compiler) then it should be possible to avoid any hash table lookups when those global tokens are read or written to. So while True: can be just as efficient as while 1: Is it reasonable to require the compiler to be aware of the set of builtins and their indexes at the time the bytecode is produced? That is one cost to consider - a less clean separation between the compiler and the meaning behind the code it compiles. One cost to consider is the memory consumed by unsused slots in the array of global tokens for a module if that module uses a small number of builtins, but also writes to one of them so that the array needs to be copied. But hopefully that won't be too bad. -- ----------------------------------------------------------------------- | Steven Elliott | selliott4@austin.rr.com | -----------------------------------------------------------------------