On Sun, Jan 2, 2011 at 9:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 1/2/2011 10:18 PM, Guido van Rossum wrote:
My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.
I believe this amounts to saying
1) Python code executes in three scopes (rather than two): global builtin, modular (misleadingly call global), and local. This much is a possible viewpoint today.
In fact it is the specification today.
2) A name that is not an assignment target anywhere -- and that matches a builtin name -- is treated as a builtin. This is the new part, and it amounts to a rule for entire modules that is much like the current rule for separating local and global names within a function. The difference from the global/local rule would be that unassigned non-builtin names would be left to runtime resolution in globals.
It would seem that this new rule would simplify the lookup of module ('global') names since if xxx in not in globals, there is no need to look in builtins. This is assuming that following 'len=len' with 'del len' cannot 'unmodularize' the name.
Actually I would leave the lookup mechanism for names that don't get special treatment the same -- the only difference would be for builtins in contexts where the compiler can generate better code (typically involving a new opcode) based on all the conditions being met.
For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass.
We actually already do such a pass.
So it could not work interactively.
That's fine. We could also disable it automatically in when eval() or exec() is the source of the code.
Should batch mode main modules work the same as when imported?
Yes.
Interactive mode could work as it does at present or with slight modification, which would be that builtin names within functions, if not yet overridden, also get resolved when the function is compiled.
Interactive mode would just work as it does today. I would also make a rule saying that 'open' is not treated this way. It is the only one where I can think of legitimate reasons for changing the semantics dynamically in a way that is not detectable by the compiler, assuming it only sees the source code for one module at a time. Some things that could be optimized this way: len(x), isinstance(x, (int, float)), range(10), issubclass(x, str), bool(x), int(x), hash(x), etc... in general, the less the function does the better a target for this optimization it is. One more thing: to avoid heisenbugs, I propose that, for any particular builtin, if this optimization is used anywhere in a module, it is should be used everywhere in that module (except in scopes where the name has a different meaning). This means that we can tell users about it and they can observe it without too much of a worry that a slight change to their program might disable it. (I've seen this with optimizations in gcc, and it makes performance work tricky.) Still, it's all academic until someone implements some of the optimizations. (There rest of the work is all in the docs and in the users' minds.) -- --Guido van Rossum (python.org/~guido)