Thanks for forwarding this. It took me a while to catch on to the thread being moved here from python-dev.
On Thu, 2007-02-22 at 14:25 +1300, Greg Ewing wrote:
Steven Elliott wrote:
What I have in mind may be close to what you are suggesting above.
My idea is somewhat more uniform and general than that.
For the module dict, you use a special mapping type that allows selected items to be accessed by an index as well as a name. The set of such names is determined when the module's code is compiled -- it's simply the names used in that module to refer to globals or builtins.
That sounds like an interesting idea. How does it differ from PEP 280?: http://www.python.org/dev/peps/pep-0280 (assuming PEP 280 isn't what you are describing). If there is some place I can read more about this idea, that would be great.
The first time a given builtin is referenced in the module, it will be unbound in the module dict, so it is looked up in the usual way and then written into the module dict, so it can subsequently be retrieved by index.
The advantages of this scheme over yours are that it speeds access to module-level names as well as builtins, and it doesn't require the compiler to have knowledge of a predefined set of names.
How does your scheme speed access to module-level names? Are they referred to by index? With your idea would module level names only be referred to by index internally in the module as global variables (LOAD_GLOBAL)? It seems like referring to attributes inside the module from outside the module (LOAD_ATTR) would require something like a visioning scheme where the compiler knows in advance knows what version the module is so that it can get the index right.
Again, if your idea is PEP 280 then my questions in the above paragraph are answered.
It does entail a slight semantic change, as changes made to a builtin won't be seen by a module that has already used that builtin for the first time. But from what Guido has said before, it seems he is willing to accept a change like that if it will help.
I think slight semantic changes like that are worth it if it buys greater performance, but I understand the importance of reverse compatibility as well.
After exploring some of the different ideas for making globals/bulitins/attributes more efficient it seems to me that there is an overall tradeoff - How much complexity is justified to avoid hash table lookups or extra levels of indirection?
For example, PEP 280 avoids a hash table lookup but adds a level of indirection (the cells point to the value), which is a net performance gain. My idea (my last big email) avoids a level of indirection, but only for a predefined set of names. And it requires new opcodes. And the compiler has to be aware of the predefined set of names.
I have a question about PEP 280, but maybe I'll ask it here since it seems relevant.
One of the elegant things about the way Python compiles code is that, for the most part, each function can be compiled independently without concern for context. For example, the co_names in the code object for a function has only the names for that function. So the co_names for a given function does not depend on anything outside of that function.
What if PEP 280's proposed co_globals, which is currently only has globals referenced by that function (similar to co_names), was instead a pointer to a shared array of globals that was common for the entire module (one to one with the module's dict)?
I think doing so could avoid a level of indirection, but at the cost of forcing the compiler to keep track of the indexes of all globals so that it could get the index right (the index being the "<i>" in the proposed LOAD_GLOBAL_CELL). The celldict might then have indexes into co_globals instead of cells. So the cost would be making the compiling of functions less independent.