
On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote:
Steven Elliott wrote:
One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to.
As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code.
The important difference between builtins and globals is that with builtins both the compiler and the runtime can enumerate all references to builtins in a single consistent way. That is "True" can always be builtin #3 and "len" can always be builtin #17, or whatever. This isn't true of globals in that a pyc file referencing a global in a module may have been compiled with a different version of that module (that is "some_module.some_global" can't compiled to single fixed index since stuff may shift around in "some_module"). With globals you have the same kind of problem that you have with operating systems that use ordinals to refer to symbols in shared libraries. So in the case of a static reference to a builtin ("while True", or whatever) the compiler would generate something that refers to it with that builtin's index (such as a new "BUILTIN_OP" opcode, as Philip suggested). Ordinary globals (non-builtins) would continue to be generated as the same code (the "LOAD_GLOBAL" opcode (I'll only refer to the loading opcodes in this email)). In the case of a dynamic reference to a builtin ("eval('True = 7')" or "from foo import *" or whatever) would generate the opcode that indicates that the runtime needs to figure out what do to (the same "LOAD_NAME" opcode). The second part of the the "LOAD_NAME" opcode is similar to the current "LOAD_GLOBAL" opcode - it first checks the hash tables of globals and then checks the hash table of builtins. However, the second part of the "LOAD_NAME" opcode could be implemented such that it first checks against a list of default builtins (which could be a hash table that returns the index of that builtin) and then indexes into the array of builtins if it is found, or retrieves it from the single hash table of globals otherwise. So the "LOAD_NAME" opcode (or similar attempts to dynamically get a name) would almost be as efficient as it currently it.
That's great, but I'm curious if additional gains can be made be focusing just on builtins.
As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module.
One way of looking at is rather than having a clear distinction between builtins and globals (as there currently is) there would be a single global name space that internally in Python is implemented in two data structures. An array for frequently used names and a hash table for infrequently used names. And the division between the two wouldn't even have two be between globals and builtins like we've been talking about so far. What distinguishes the builtins is you get them for free (initialized on startup). So, it would be possible to insert infrequently used builtins into the hash table of infrequently used names only when the module refers to it. Conversely, names that aren't builtins, but that are used frequently in many different modules, such as "sys" or "os", could have indexes set aside for for them in the array of frequently used names. Later, when when it gets a value (because "sys" is imported, or whatever) it just gets stuck into the predetermined slot in the array of frequently used names. Since builtins can be shadowed, as you point out, there would have to be one array of frequently used names per module. But often it would be the same as other modules. So internally, as a matter of efficiency, the interpreter could use a copy on write strategy where a global array of frequently used names is used by the module until it assigns to "True", or something like that, at which point it gets its own copy. -- ----------------------------------------------------------------------- | Steven Elliott | selliott4@austin.rr.com | -----------------------------------------------------------------------