__builtins__ behavior and... the FUTURE!

I'd post this on Python-dev, but it has more to do with the future of Python, and it directly impacts the fairly-well-received Python-idea I'm working on right now. The current behavior has persisted since revision 9877, nine years ago: http://svn.python.org/view?rev=9877&view=rev "Vladimir Marangozov' performance hack: copy f_builtins from ancestor if the globals are the same." A variant of the behavior has persisted since the age of the dinosaurs, as far as I can tell - or at least ever since Python had stack frames. Here's how the globals/builtins lookup is currently presented as working: 1. If 'name' is in globals, return globals['name'] 2. Return globals['__builtins__']['name'] Glossing over a lot of details, here's how it *actually* worked before the performance hack: 0. A code object gets executed, which creates a stack frame. It sets frame.builtins = globals['__builtins__']. While executing the code: 1. If 'name' is in globals, return globals['name']. 2. Otherwise return frame.builtins['name']. A problem example, which is still a problem today: __builtins__ = {'len': lambda x: 1} print len([1, 2, 3]) # prints: # '3' when run as a script # '1' in interactive mode If running as a script or part of an import, the module's frame caches builtins, so it doesn't matter that it gets reassigned. When 'len' is looked up for the print statement, it's looked up in the cached version. But in interactive mode, each statement is executed in its own frame, so it doesn't have this problem. Well, at least module *functions* will run in their own frames, so they'll see the new builtins, right? But here's how it works now, after the performance hack: 0. A code object gets executed, which creates a stack frame. a. If the stack frame has a parent (think "call site") and the parent has the same globals, it sets frame.builtins = parent.builtins. b. Otherwise it sets frame.builtins = globals['__builtins__']. While executing the code: 1. If 'name' is in globals, return globals['name']. 2. Otherwise return frame.builtins['name']. A problem example: __builtins__ = {'len': lambda x: 1} def f(): print len([1, 2, 3]) f() # prints: # '3' when run as a script # '1' in interactive mode At the call site "f()", frame.builtins is the original, cached builtins. Before the hack, f()'s frame would have recalculated and re-cached it. After the hack, f()'s frame inherits the cached version. But this only happens in a script, which runs its code in a single frame. If you try this in interactive mode, you'll get correct behavior. If function calls stay within a module, builtins is effectively frozen at the value it had when the module started execution. But if outside modules call those same functions, builtins will have its new value! That could be bad: import my_extra_special_builtins as __builtins__ <define extra-special library functions that use new builtins> def run_tests_on_extra_special_functions(): <tests, etc.> if __name__ == '__main__': run_tests_on_extra_special_functions() The special library functions work, but the tests don't. The special builtins module only shows up when functions are called from outside modules (where the call sites have different globals) and the functions' frames are forced to recalculate builtins rather than inheriting it. Here are some ways around the problem: 1. Put all the tests in a different module. 2. Use a unit testing framework, which will call the module functions from outside the module. 3. Call functions using exec with custom globals. 4. Replace functions using types.FunctionType with custom globals. #3 and #4 are decidedly unlikely. :) #1 is generally discouraged (AFAIK) if not annoying, and #2 is encouraged. In the last thread on __builtins__ vs. __builtin__, back in March, it seemed that Guido was open to new ideas for Python 3.0 on the subject. Well, keeping in mind this strange behavior and the length of time it's gone on, here's my recommendation: Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL look in "builtins" (currently "__builtin__") for names after it checks globals. If modules want to hack at builtins, they can import it. But they hack it globally or not at all. I honestly can't think of a use case you can handle by replacing a module's __builtins__ that can't be handled without. If there is one, nobody actually does it, because we would have heard them screaming in agony and banging their heads against the walls from thousands of miles away by now. You just can't do it reliably as of February 1998. The regression test suite doesn't even touch things like this. It only goes as far as injecting stuff into __builtin__. Finally, on to my practical problem. I'm working on the fast globals stuff, which is how I got onto this subject in the first place. Here are a few of my options: 1. I can make __builtins__ work like it was always supposed to, at the cost of decreased performance and extra complexity. It would still be much faster than it is now, though. 2. Status quo: I can make __builtins__ work like it does now. I think I can do this, anyway. It's actually more complex than #1, and very likely slower. I would rather not take this route. 3. For a given function, I can freeze __builtins__ at the value it was at when the function was defined. 4. I can make it work like I suggested for Python 3.0, but make __builtin__ automatically available to modules as __builtins__. With or without it, I should be posting my patch for fast globals soon. No, don't look at me like that. I'm serious! Wondering-what-to-do-ly, Neil

On 11/24/07, Neil Toronto <ntoronto@cs.byu.edu> wrote: [I'm summarizing and paraphrasing] If a name isn't in globals, python looks in globals['__builtins__']['name'] Unfortunately, it may use a stale cached value for globals['__builtins__'] ...
Well, keeping in mind this strange behavior and the length of time it's gone on, here's my recommendation:
As Greg pointed out, this isn't so good for sandboxes. But as long as you're changing dicts to be better namespaces, why not go a step farther? Instead of using a magic key name (some spelling variant of builtin), make the fallback part of the dict itself. For example: Use a defaultdict and set the __missing__ method to the builtin's __getitem__. Then neither python nor the frame need to worry about tracking the builtin namespace, but the fallback can be reset (even on a per-function basis) by simply replacing the fallback method. -jJ

The semantics of __builtins__ are an implementation detail used for sandboxing, and assignment to __builtins__ is not supported. Alas, I can't quite figure out what you're after; your post doesn't start with a clear problem statement, so I'm not even sure if this is helpful information. I just hope to encourage you from trying to change the semantics of __builtins__. In 3.0, __builtins__ may well be renamed. --Guido On Nov 24, 2007 4:41 AM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Sorry - it was very early in the morning when I did my analysis, so I wasn't as clear as I could have been. I had two points: 1. A suggestion for future builtins, which is probably the wrong thing to do. Please disregard this. 2. A question about which semantics fast globals should support, and how different they can be from the current semantics and still be acceptable. I have two problems with the current semantics: 1. They seem very wrong to me, even for an implementation detail. Python developers rely on function behavior being invariant to the call site. (As much as Python developers could be said to rely on any invariance, anyway.) 2. Implementing the current semantics with fast globals seems unnecessary. It no longer helps performance (it hurts it a tiny bit), and the code that does it reads like a pasted-on hack. I've since discovered that it wouldn't be much slower. Here are some times for one of my "builtins get" benchmarks: Current builtins: 3.11 sec Fast builtins, immediate semantics: 1.81 sec Fast builtins, current or pre-1998: 1.64 sec (+ epsilon for hack) "Immediate" semantics (which I find most correct) are a little slower because it has to check whether __builtins__ has changed every time a globals lookup fails, before it does a builtins lookup. In "pre-1998" semantics, a change of __builtins__ is checked only with a new stack frame. Besides those results, fast globals reduces function call overhead by 10%. I haven't measured what effect the hack has on that. Personally, I like fast globals with pre-1998 semantics best, though there's still a difference in meaning between script and interactive mode. I can do it that way, the current way, or the immediate way. Or I could make current vs. pre-1998 selectable by macro. Do you have a preference? I swear, though, I'm nearly ready to post a patch. :) Neil

On Nov 26, 2007 12:50 PM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
Please assume I didn't read your initial post. "Very wrong" is a strong stance. Care to explain what's wrong and why? Without more info I'm not sure I understand what you're saying about call site invariance.
Please provide full context (I'm also behind on the fast globals thread). What exactly do you mean by "the current semantics"? And what's the problem with implementing it with fast globals?
Where's the benchmark source code?
"Immediate" semantics (which I find most correct)
Even though I already told you not to care?
Given that *nobody* should assign to __builtins__ in their current globals, *ever*, I'm fine with pre-1998 semantics if it's fastest.
I swear, though, I'm nearly ready to post a patch. :)
Please consider posting it before replying to this post. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
The semantics of __builtins__ are an implementation detail used for sandboxing, and assignment to __builtins__ is not supported.
Perhaps in 3.0 there could be an additional argument to eval and exec for supplying a builtin namespace? Then sandboxing code wouldn't have to make assumptions about the implementation, and the way would be open for optimising it in any way we wanted. -- Greg

On Nov 26, 2007 3:21 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Good idea. If only I hadn't made a mistake in the signature design... It's kind of awkward to have it be exec(code, globals, locals, builtins), but I'm afraid that changing it to exec(code, locals, globals, builtins) would break too much code in the transition (2to3 notwithstanding). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 11/24/07, Neil Toronto <ntoronto@cs.byu.edu> wrote: [I'm summarizing and paraphrasing] If a name isn't in globals, python looks in globals['__builtins__']['name'] Unfortunately, it may use a stale cached value for globals['__builtins__'] ...
Well, keeping in mind this strange behavior and the length of time it's gone on, here's my recommendation:
As Greg pointed out, this isn't so good for sandboxes. But as long as you're changing dicts to be better namespaces, why not go a step farther? Instead of using a magic key name (some spelling variant of builtin), make the fallback part of the dict itself. For example: Use a defaultdict and set the __missing__ method to the builtin's __getitem__. Then neither python nor the frame need to worry about tracking the builtin namespace, but the fallback can be reset (even on a per-function basis) by simply replacing the fallback method. -jJ

The semantics of __builtins__ are an implementation detail used for sandboxing, and assignment to __builtins__ is not supported. Alas, I can't quite figure out what you're after; your post doesn't start with a clear problem statement, so I'm not even sure if this is helpful information. I just hope to encourage you from trying to change the semantics of __builtins__. In 3.0, __builtins__ may well be renamed. --Guido On Nov 24, 2007 4:41 AM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Sorry - it was very early in the morning when I did my analysis, so I wasn't as clear as I could have been. I had two points: 1. A suggestion for future builtins, which is probably the wrong thing to do. Please disregard this. 2. A question about which semantics fast globals should support, and how different they can be from the current semantics and still be acceptable. I have two problems with the current semantics: 1. They seem very wrong to me, even for an implementation detail. Python developers rely on function behavior being invariant to the call site. (As much as Python developers could be said to rely on any invariance, anyway.) 2. Implementing the current semantics with fast globals seems unnecessary. It no longer helps performance (it hurts it a tiny bit), and the code that does it reads like a pasted-on hack. I've since discovered that it wouldn't be much slower. Here are some times for one of my "builtins get" benchmarks: Current builtins: 3.11 sec Fast builtins, immediate semantics: 1.81 sec Fast builtins, current or pre-1998: 1.64 sec (+ epsilon for hack) "Immediate" semantics (which I find most correct) are a little slower because it has to check whether __builtins__ has changed every time a globals lookup fails, before it does a builtins lookup. In "pre-1998" semantics, a change of __builtins__ is checked only with a new stack frame. Besides those results, fast globals reduces function call overhead by 10%. I haven't measured what effect the hack has on that. Personally, I like fast globals with pre-1998 semantics best, though there's still a difference in meaning between script and interactive mode. I can do it that way, the current way, or the immediate way. Or I could make current vs. pre-1998 selectable by macro. Do you have a preference? I swear, though, I'm nearly ready to post a patch. :) Neil

On Nov 26, 2007 12:50 PM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
Please assume I didn't read your initial post. "Very wrong" is a strong stance. Care to explain what's wrong and why? Without more info I'm not sure I understand what you're saying about call site invariance.
Please provide full context (I'm also behind on the fast globals thread). What exactly do you mean by "the current semantics"? And what's the problem with implementing it with fast globals?
Where's the benchmark source code?
"Immediate" semantics (which I find most correct)
Even though I already told you not to care?
Given that *nobody* should assign to __builtins__ in their current globals, *ever*, I'm fine with pre-1998 semantics if it's fastest.
I swear, though, I'm nearly ready to post a patch. :)
Please consider posting it before replying to this post. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
The semantics of __builtins__ are an implementation detail used for sandboxing, and assignment to __builtins__ is not supported.
Perhaps in 3.0 there could be an additional argument to eval and exec for supplying a builtin namespace? Then sandboxing code wouldn't have to make assumptions about the implementation, and the way would be open for optimising it in any way we wanted. -- Greg

On Nov 26, 2007 3:21 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Good idea. If only I hadn't made a mistake in the signature design... It's kind of awkward to have it be exec(code, globals, locals, builtins), but I'm afraid that changing it to exec(code, locals, globals, builtins) would break too much code in the transition (2to3 notwithstanding). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (4)
-
Greg Ewing
-
Guido van Rossum
-
Jim Jewett
-
Neil Toronto