[Python-ideas] __builtins__ behavior and... the FUTURE!

Neil Toronto ntoronto at cs.byu.edu
Sat Nov 24 13:41:56 CET 2007


I'd post this on Python-dev, but it has more to do with the future of 
Python, and it directly impacts the fairly-well-received Python-idea I'm 
working on right now.

The current behavior has persisted since revision 9877, nine years ago:

http://svn.python.org/view?rev=9877&view=rev

"Vladimir Marangozov' performance hack: copy f_builtins from ancestor
if the globals are the same."

A variant of the behavior has persisted since the age of the dinosaurs, 
as far as I can tell - or at least ever since Python had stack frames.

Here's how the globals/builtins lookup is currently presented as working:

     1. If 'name' is in globals, return globals['name']
     2. Return globals['__builtins__']['name']

Glossing over a lot of details, here's how it *actually* worked before 
the performance hack:

     0. A code object gets executed, which creates a stack frame. It
        sets frame.builtins = globals['__builtins__'].
     While executing the code:
     1. If 'name' is in globals, return globals['name'].
     2. Otherwise return frame.builtins['name'].

A problem example, which is still a problem today:

     __builtins__ = {'len': lambda x: 1}
     print len([1, 2, 3])
     # prints:
     #   '3' when run as a script
     #   '1' in interactive mode

If running as a script or part of an import, the module's frame caches 
builtins, so it doesn't matter that it gets reassigned. When 'len' is 
looked up for the print statement, it's looked up in the cached version. 
But in interactive mode, each statement is executed in its own frame, so 
it doesn't have this problem.

Well, at least module *functions* will run in their own frames, so 
they'll see the new builtins, right? But here's how it works now, after 
the performance hack:

     0. A code object gets executed, which creates a stack frame.
        a. If the stack frame has a parent (think "call site") and
          the parent has the same globals, it sets
          frame.builtins = parent.builtins.
        b. Otherwise it sets frame.builtins = globals['__builtins__'].
     While executing the code:
     1. If 'name' is in globals, return globals['name'].
     2. Otherwise return frame.builtins['name'].

A problem example:

     __builtins__ = {'len': lambda x: 1}
     def f(): print len([1, 2, 3])
     f()
     # prints:
     #   '3' when run as a script
     #   '1' in interactive mode


At the call site "f()", frame.builtins is the original, cached builtins. 
Before the hack, f()'s frame would have recalculated and re-cached it. 
After the hack, f()'s frame inherits the cached version. But this only 
happens in a script, which runs its code in a single frame. If you try 
this in interactive mode, you'll get correct behavior.

If function calls stay within a module, builtins is effectively frozen 
at the value it had when the module started execution. But if outside 
modules call those same functions, builtins will have its new value! 
That could be bad:

     import my_extra_special_builtins as __builtins__

     <define extra-special library functions that use new builtins>

     def run_tests_on_extra_special_functions():
         <tests, etc.>

     if __name__ == '__main__':
         run_tests_on_extra_special_functions()

The special library functions work, but the tests don't. The special 
builtins module only shows up when functions are called from outside 
modules (where the call sites have different globals) and the functions' 
frames are forced to recalculate builtins rather than inheriting it. 
Here are some ways around the problem:

     1. Put all the tests in a different module.
     2. Use a unit testing framework, which will call the module
        functions from outside the module.
     3. Call functions using exec with custom globals.
     4. Replace functions using types.FunctionType with custom globals.

#3 and #4 are decidedly unlikely. :) #1 is generally discouraged (AFAIK) 
if not annoying, and #2 is encouraged.

In the last thread on __builtins__ vs. __builtin__, back in March, it 
seemed that Guido was open to new ideas for Python 3.0 on the subject. 
Well, keeping in mind this strange behavior and the length of time it's 
gone on, here's my recommendation:

     Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL
     look in "builtins" (currently "__builtin__") for names after it
     checks globals. If modules want to hack at builtins, they can
     import it. But they hack it globally or not at all.

I honestly can't think of a use case you can handle by replacing a 
module's __builtins__ that can't be handled without. If there is one, 
nobody actually does it, because we would have heard them screaming in 
agony and banging their heads against the walls from thousands of miles 
away by now. You just can't do it reliably as of February 1998.

The regression test suite doesn't even touch things like this. It only 
goes as far as injecting stuff into __builtin__.

Finally, on to my practical problem.

I'm working on the fast globals stuff, which is how I got onto this 
subject in the first place. Here are a few of my options:

     1. I can make __builtins__ work like it was always supposed to, at
        the cost of decreased performance and extra complexity. It would
        still be much faster than it is now, though.
     2. Status quo: I can make __builtins__ work like it does now. I
        think I can do this, anyway. It's actually more complex than #1,
        and very likely slower. I would rather not take this route.
     3. For a given function, I can freeze __builtins__ at the value it
        was at when the function was defined.
     4. I can make it work like I suggested for Python 3.0, but make
        __builtin__ automatically available to modules as __builtins__.

With or without it, I should be posting my patch for fast globals soon. 
No, don't look at me like that. I'm serious!

Wondering-what-to-do-ly,
Neil



More information about the Python-ideas mailing list