[Python-Dev] Making builtins more efficient
Guido van Rossum
guido at python.org
Tue Feb 20 16:48:21 CET 2007
If this is not a replay of an old message, please move the discussion
to python-ideas.
On 2/20/07, Steven Elliott <selliott4 at austin.rr.com> wrote:
> I'm finally getting back into this. I'd like to take one more shot at
> it with a revised version of what I proposed before.
>
> For those of you that did not see the original thread it was about ways
> that accessing builtins could be more efficient. It's a bit much to
> summarize again now, but you should be able to find it in the archive
> with this subject and a date of 2006-03-08.
>
> On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote:
> > Steven Elliott wrote:
> > > One way of handling it is to
> > > alter STORE_ATTR (op code for assigning to mod.str) to always check to
> > > see if the key being assigned is one of the default builtins. If it is,
> > > then the module's indexed array of builtins is assigned to.
> >
> > As long as you're going to all that trouble, it
> > doesn't seem like it would be much harder to treat
> > all global names that way, instead of just a predefined
> > set. The compiler already knows all of the names that
> > are used as globals in the module's code.
>
> What I have in mind may be close to what you are suggesting above. My
> thought now is that builtins are a set of tokens that typically, but
> don't necessarily, point to the same objects in all modules. Such
> tokens, which I'll refer to as "global tokens", can be roughly broken
> into two sets:
> 1) Global tokens that typically point to the same object in all
> modules.
> 2) Global tokens that that are likely to point to the different
> objects (or be undefined) in different modules.
> Set 1) is pretty much the the builtins. "True" and "len" are likely to
> point to the same objects in all modules, but not necessarily. Set 2)
> might be things like "os" and "sys" which are often defined (imported)
> in modules, but not necessarily.
>
> Access to the globals of a module, including the current module, is done
> with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For
> each of these opcodes the following snippet of code from ceval.c (for
> LOAD_GLOBAL) is relevant to this discussion:
> /* This is the un-inlined version of the code above */
> x = PyDict_GetItem(f->f_globals, w);
> if (x == NULL) {
> x = PyDict_GetItem(f->f_builtins, w);
> if (x == NULL) {
> load_global_error:
> format_exc_check_arg(
> PyExc_NameError,
> GLOBAL_NAME_ERROR_MSG, w);
> break;
> }
> }
>
> So, to avoid the hash table lookups above maybe the global tokens could
> be assigned an index value that is fixed for any given version of the
> interpreter and that is the same for all modules (that "True" is always
> index 7, "len" is always index 3, etc.)
>
> Once a set of indexes have been determined a new opcode, that I'll call
> "LOAD_GTOKEN", could be created that avoids the hash table lookup by
> functioning in a way that is similar to LOAD_FAST (pull a local variable
> value out of an array). For example, static references to "True" could
> always be compiled to
> LOAD_GTOKEN 7 (True)
>
> As to set 1) and set 2) that I mentioned above - there is only a need to
> distinguish between the two sets if a copy-on-write mechanism is used.
> That way global tokens that are likely to have their value changed
> (group 2) ) can all be together in one group so that only that group
> needs to be copied when one of the global tokens is written to. For
> example code such as:
> True = 1
> print True
> would be compiled into something like:
> 1 LOAD_CONST 1 (1)
> STORE_GTOKEN1 7 (True)
> 2 LOAD_GTOKEN1 7 (True)
> PRINT_ITEM
> PRINT_NEWLINE
> Note that "1" has been appended to "STORE_GTOKEN" to indicate that group
> 1) is being worked with. The store command will copy the array of
> pointers once, the first time it is called.
>
> Just as a new opcode is needed for LOAD_GLOBAL one would be needed for
> LOAD_ATTR. Perhaps "LOAD_ATOKEN" would work. For example:
> amodule.len = my_len
> print amodule.len
> would be compiled into something like:
> 1 LOAD_GLOBAL 0 (my_len)
> LOAD_GLOBAL 1 (amodule)
> STORE_ATOKEN1 3 (len)
>
> 2 LOAD_GLOBAL 1 (amodule)
> LOAD_ATOKEN1 3 (len)
> PRINT_ITEM
> PRINT_NEWLINE
> LOAD_CONST 0 (None)
> RETURN_VALUE
>
> Note that it looks almost identical to the code that is currently
> generated, but the oparg "3" shown for the "LOAD_ATOKEN1" above indexes
> into an array (like LOAD_FAST) to get at the attribute directly whereas
> the oparg that would be shown for LOAD_ATTR is an index into an array of
> constants/strings which is then used to retrieve the attribute from the
> module's global hash table.
>
> > > That's great, but I'm curious if additional gains can be
> > > made be focusing just on builtins.
> >
> > As long as builtins can be shadowed, I can't see how
> > to make any extra use of the fact that it's a builtin.
> > A semantic change would be needed, such as forbidding
> > shadowing of builtins, or at least forbidding this
> > from outside the module.
>
> I now think that it best not to think of builtins as being a special
> case. What really matters is that if all modules agree on a way of
> indexing global tokens (possible since they are all compiled with the
> same compiler) then it should be possible to avoid any hash table
> lookups when those global tokens are read or written to. So
> while True:
> can be just as efficient as
> while 1:
>
> Is it reasonable to require the compiler to be aware of the set of
> builtins and their indexes at the time the bytecode is produced? That
> is one cost to consider - a less clean separation between the compiler
> and the meaning behind the code it compiles.
>
> One cost to consider is the memory consumed by unsused slots in the
> array of global tokens for a module if that module uses a small number
> of builtins, but also writes to one of them so that the array needs to
> be copied. But hopefully that won't be too bad.
>
> --
> -----------------------------------------------------------------------
> | Steven Elliott | selliott4 at austin.rr.com |
> -----------------------------------------------------------------------
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list