Re: [Python-Dev] Making builtins more efficient

Feb. 20, 2007

      I'm finally getting back into this.  I'd like to take one more shot at
it with a revised version of what I proposed before.  

For those of you that did not see the original thread it was about ways
that accessing builtins could be more efficient.  It's a bit much to
summarize again now, but you should be able to find it in the archive
with this subject and a date of 2006-03-08.  

On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote:
...
Steven Elliott wrote:
...
One way of handling it is to
alter STORE_ATTR (op code for assigning to mod.str) to always check to
see if the key being assigned is one of the default builtins.  If it is,
then the module's indexed array of builtins is assigned to.
As long as you're going to all that trouble, it
doesn't seem like it would be much harder to treat
all global names that way, instead of just a predefined
set. The compiler already knows all of the names that
are used as globals in the module's code.
What I have in mind may be close to what you are suggesting above.  My
thought now is that builtins are a set of tokens that typically, but
don't necessarily, point to the same objects in all modules.  Such
tokens, which I'll refer to as "global tokens", can be roughly broken
into two sets:
    1) Global tokens that typically point to the same object in all
modules.
    2) Global tokens that that are likely to point to the different
objects (or be undefined) in different modules.
Set 1) is pretty much the the builtins.  "True" and "len" are likely to
point to the same objects in all modules, but not necessarily.  Set 2)
might be things like "os" and "sys" which are often defined (imported)
in modules, but not necessarily.

Access to the globals of a module, including the current module, is done
with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME).  For
each of these opcodes the following snippet of code from ceval.c (for
LOAD_GLOBAL) is relevant to this discussion:
    /* This is the un-inlined version of the code above */
    x = PyDict_GetItem(f->f_globals, w);
    if (x == NULL) {
            x = PyDict_GetItem(f->f_builtins, w);
            if (x == NULL) {
              load_global_error:
                    format_exc_check_arg(
                                PyExc_NameError,
                                GLOBAL_NAME_ERROR_MSG, w);
                    break;
            }
    }

So, to avoid the hash table lookups above maybe the global tokens could
be assigned an index value that is fixed for any given version of the
interpreter and that is the same for all modules (that "True" is always
index 7, "len" is always index 3, etc.)

Once a set of indexes have been determined a new opcode, that I'll call
"LOAD_GTOKEN", could be created that avoids the hash table lookup by
functioning in a way that is similar to LOAD_FAST (pull a local variable
value out of an array).  For example, static references to "True" could
always be compiled to
      LOAD_GTOKEN     7 (True)

As to set 1) and set 2) that I mentioned above - there is only a need to
distinguish between the two sets if a copy-on-write mechanism is used.
That way global tokens that are likely to have their value changed
(group 2) ) can all be together in one group so that only that group
needs to be copied when one of the global tokens is written to.  For
example code such as:
    True = 1
    print True
would be compiled into something like:
  1   LOAD_CONST      1 (1)
      STORE_GTOKEN1   7 (True)
  2   LOAD_GTOKEN1    7 (True)
      PRINT_ITEM
      PRINT_NEWLINE
Note that "1" has been appended to "STORE_GTOKEN" to indicate that group
1) is being worked with.  The store command will copy the array of
pointers once, the first time it is called.

Just as a new opcode is needed for LOAD_GLOBAL one would be needed for
LOAD_ATTR.  Perhaps "LOAD_ATOKEN" would work.  For example:
    amodule.len = my_len
    print amodule.len
would be compiled into something like:
  1   LOAD_GLOBAL     0 (my_len)
      LOAD_GLOBAL     1 (amodule)
      STORE_ATOKEN1   3 (len)

  2   LOAD_GLOBAL     1 (amodule)
      LOAD_ATOKEN1    3 (len)
      PRINT_ITEM
      PRINT_NEWLINE
      LOAD_CONST      0 (None)
      RETURN_VALUE

Note that it looks almost identical to the code that is currently
generated, but the oparg "3" shown for the "LOAD_ATOKEN1" above indexes
into an array (like LOAD_FAST) to get at the attribute directly whereas
the oparg that would be shown for LOAD_ATTR is an index into an array of
constants/strings which is then used to retrieve the attribute from the
module's global hash table.
...
...
That's great, but I'm curious if additional gains can be
made be focusing just on builtins.
As long as builtins can be shadowed, I can't see how
to make any extra use of the fact that it's a builtin.
A semantic change would be needed, such as forbidding
shadowing of builtins, or at least forbidding this
from outside the module.
I now think that it best not to think of builtins as being a special
case.  What really matters is that if all modules agree on a way of
indexing global tokens (possible since they are all compiled with the
same compiler) then it should be possible to avoid any hash table
lookups when those global tokens are read or written to.  So
    while True:
can be just as efficient as 
    while 1:

Is it reasonable to require the compiler to be aware of the set of
builtins and their indexes at the time the bytecode is produced?  That
is one cost to consider - a less clean separation between the compiler
and the meaning behind the code it compiles.

One cost to consider is the memory consumed by unsused slots in the
array of global tokens for a module if that module uses a small number
of builtins, but also writes to one of them so that the array needs to
be copied.  But hopefully that won't be too bad.

-- 
-----------------------------------------------------------------------
|          Steven Elliott          |      selliott4@austin.rr.com     |
-----------------------------------------------------------------------