variable name resolution in exec is incorrect
Hi, issue991196 was closed being described as intentional. I've added a comment in that issue which argues that this is a serious bug (also aserted by a previous commenter - Armin Rigo), because it creates a unique, undocumented, oddly behaving scope that doesn't apply closures correctly. At the very least I think this should be acknowledged as a plain old bug (rather than a feature), and then a discussion about whether it will be fixed or not. Appreciate your thoughts - cheers, Colin
On Wed, May 26, 2010 at 10:15 AM, Colin H <hawkett@gmail.com> wrote:
issue991196 was closed being described as intentional. I've added a comment in that issue which argues that this is a serious bug (also aserted by a previous commenter - Armin Rigo), because it creates a unique, undocumented, oddly behaving scope that doesn't apply closures correctly. At the very least I think this should be acknowledged as a plain old bug (rather than a feature), and then a discussion about whether it will be fixed or not.
Here's a quick recap of the issue so that people don't have to go searching through the bug archive. In Python 2.x, we get the following behaviour:
code = """\ ... y = 3 ... def f(): ... return y ... f() ... """ exec code in {} # works fine exec code in {}, {} # dies with a NameError Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in <module> File "<string>", line 3, in f NameError: global name 'y' is not defined
The issue is whether the second example should work, given that two different dictionaries have been passed. The cause of the NameError can be seen by looking at the bytecode: y is bound using STORE_NAME, which stores y into the locals dictionary (which here is *not* the same as the globals dictionary) but the attempt to retrieve the value of y uses LOAD_GLOBAL, which only looks in the globals.
co = compile(code, 'mycode', 'exec') dis.dis(co) 1 0 LOAD_CONST 0 (3) 3 STORE_NAME 0 (y)
2 6 LOAD_CONST 1 (<code object f at 0xa22b40, file "mycode", line 2>) 9 MAKE_FUNCTION 0 12 STORE_NAME 1 (f) 4 15 LOAD_NAME 1 (f) 18 CALL_FUNCTION 0 21 POP_TOP 22 LOAD_CONST 2 (None) 25 RETURN_VALUE
dis.dis(co.co_consts[1]) # disassembly of 'f' 3 0 LOAD_GLOBAL 0 (y) 3 RETURN_VALUE
This is a long way from my area of expertise (I'm commenting here because it was me who sent Colin here in the first place), and it's not clear to me whether this is a bug, and if it is a bug, how it could be resolved. What would the impact be of having the compiler produce 'LOAD_NAME' rather than 'LOAD_GLOBAL' here? Mark
On 26/05/10 19:48, Mark Dickinson wrote:
This is a long way from my area of expertise (I'm commenting here because it was me who sent Colin here in the first place), and it's not clear to me whether this is a bug, and if it is a bug, how it could be resolved. What would the impact be of having the compiler produce 'LOAD_NAME' rather than 'LOAD_GLOBAL' here?
exec with a single argument = module namespace exec with two arguments = class namespace Class namespaces are deliberately exempted from lexical scoping so that methods can't see class attributes, hence the example in the tracker issue works exactly as it would if the code was written as a class body. class C: y = 3 def execfunc(): print y execfunc() With this code, y would end up in C.__dict__ rather than the module globals (at least, it would if it wasn't for the exception) and the call to execfunc fails with a NameError when attempting to find y. I know I've closed other bug reports that were based on the same misunderstanding, and I didn't understand it myself until Guido explained it to me a few years back, so suggestions for improving the exec documentation in this area would be appreciated. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 26/05/2010 13:51, Nick Coghlan wrote:
On 26/05/10 19:48, Mark Dickinson wrote:
This is a long way from my area of expertise (I'm commenting here because it was me who sent Colin here in the first place), and it's not clear to me whether this is a bug, and if it is a bug, how it could be resolved. What would the impact be of having the compiler produce 'LOAD_NAME' rather than 'LOAD_GLOBAL' here?
exec with a single argument = module namespace exec with two arguments = class namespace
Class namespaces are deliberately exempted from lexical scoping so that methods can't see class attributes, hence the example in the tracker issue works exactly as it would if the code was written as a class body.
class C: y = 3 def execfunc(): print y execfunc()
With this code, y would end up in C.__dict__ rather than the module globals (at least, it would if it wasn't for the exception) and the call to execfunc fails with a NameError when attempting to find y.
I know I've closed other bug reports that were based on the same misunderstanding, and I didn't understand it myself until Guido explained it to me a few years back, so suggestions for improving the exec documentation in this area would be appreciated.
Your explanation here is very clear. Is this in the documentation? Michael
Cheers, Nick.
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On 26/05/10 23:08, Michael Foord wrote:
On 26/05/2010 13:51, Nick Coghlan wrote:
On 26/05/10 19:48, Mark Dickinson wrote:
This is a long way from my area of expertise (I'm commenting here because it was me who sent Colin here in the first place), and it's not clear to me whether this is a bug, and if it is a bug, how it could be resolved. What would the impact be of having the compiler produce 'LOAD_NAME' rather than 'LOAD_GLOBAL' here?
exec with a single argument = module namespace exec with two arguments = class namespace
Class namespaces are deliberately exempted from lexical scoping so that methods can't see class attributes, hence the example in the tracker issue works exactly as it would if the code was written as a class body.
class C: y = 3 def execfunc(): print y execfunc()
With this code, y would end up in C.__dict__ rather than the module globals (at least, it would if it wasn't for the exception) and the call to execfunc fails with a NameError when attempting to find y.
I know I've closed other bug reports that were based on the same misunderstanding, and I didn't understand it myself until Guido explained it to me a few years back, so suggestions for improving the exec documentation in this area would be appreciated.
Your explanation here is very clear. Is this in the documentation?
It isn't actually - I think Thomas may be right that the current exec documentation was largely written prior to the introduction of lexical scoping. So adding something along these lines would probably be a good place to start. Unfortunately, even my explanation above isn't 100% correct - it misses a subtle distinction where lexical scoping will pass through a class definition nested inside a function definition and see the function scope, but the use of strings for exec code means that the scopes of any containing functions are ignored completely. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 11:48, Mark Dickinson <dickinsm@gmail.com> wrote:
On Wed, May 26, 2010 at 10:15 AM, Colin H <hawkett@gmail.com> wrote:
issue991196 was closed being described as intentional. I've added a comment in that issue which argues that this is a serious bug (also aserted by a previous commenter - Armin Rigo), because it creates a unique, undocumented, oddly behaving scope that doesn't apply closures correctly. At the very least I think this should be acknowledged as a plain old bug (rather than a feature), and then a discussion about whether it will be fixed or not.
Here's a quick recap of the issue so that people don't have to go searching through the bug archive. In Python 2.x, we get the following behaviour:
code = """\ ... y = 3 ... def f(): ... return y ... f() ... """ exec code in {} # works fine exec code in {}, {} # dies with a NameError Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in <module> File "<string>", line 3, in f NameError: global name 'y' is not defined
The issue is whether the second example should work, given that two different dictionaries have been passed.
The cause of the NameError can be seen by looking at the bytecode: y is bound using STORE_NAME, which stores y into the locals dictionary (which here is *not* the same as the globals dictionary) but the attempt to retrieve the value of y uses LOAD_GLOBAL, which only looks in the globals.
co = compile(code, 'mycode', 'exec') dis.dis(co) 1 0 LOAD_CONST 0 (3) 3 STORE_NAME 0 (y)
2 6 LOAD_CONST 1 (<code object f at 0xa22b40, file "mycode", line 2>) 9 MAKE_FUNCTION 0 12 STORE_NAME 1 (f)
4 15 LOAD_NAME 1 (f) 18 CALL_FUNCTION 0 21 POP_TOP 22 LOAD_CONST 2 (None) 25 RETURN_VALUE
dis.dis(co.co_consts[1]) # disassembly of 'f' 3 0 LOAD_GLOBAL 0 (y) 3 RETURN_VALUE
This is a long way from my area of expertise (I'm commenting here because it was me who sent Colin here in the first place), and it's not clear to me whether this is a bug, and if it is a bug, how it could be resolved. What would the impact be of having the compiler produce 'LOAD_NAME' rather than 'LOAD_GLOBAL' here?
It wouldn't matter. The 'f' function only knows about its own namespace (separate from the surrounding code's local namespace), and the global namespace. LOAD_NAME is only different from LOAD_GLOBAL in that it looks in the local namespace first, but in this case the local namespace contains nothing. Here's what happens: 'exec code in d1, d2' executes code with 'd1' as locals and 'd2' as globals. Assignment always operates on the local namespace (barring the 'global' declaration.) The function definition also assigns to the local namespace, but the created function knows nothing about that local namespace -- it only cares about its own namespace and the global namespace, 'd1'. 'exec code in d1' does the same thing as 'exec code in d1, d1': it uses the same dict for the locals and the globals. The execution of the code doesn't change -- the assignment to 'y' still assigns to the locals, and the 'f' function still looks it up in globals, but now *they are the same dict*. Using the same dict for locals and globals is how modules work, as well. The main confusion here is the fact that 'exec' doesn't generate closures. (Nobody was ever confused about this behaviour back in Python 2.0-and-earlier! :-) The reason for that is the disconnect between the compiler and the exec statement: the compiler sees no enclosing function, so it doesn't generate a closure. The exec statement, because it gets two different namespaces, then executes it like a function, with a distinct local namespace. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Thanks for the details on why the observed behaviour occurs - very clear. My only query would be why this is considered correct? Why is it running as a class namespace, when it is not a class? Is there any reason why this is not considered a mistake? Slightly concerned that this is being considered not a bug because 'it is how it is'. A really good reason why you would want to provide a separate locals dictionary is to get access to the stuff that was defined in the exec()'d code block. Unfortunately this use case is broken by the current behaviour. The only way to get the definitions from the exec()'d code block is to supply a single dictionary, and then try to weed out the definitions from amongst all the other globals, which is very difficult if you don't know in advance what was in the code block you exec()'d. So put simply - the bug is that a class namespace is used, but its not a class. On 26/05/2010 13:51, Nick Coghlan wrote:
On 26/05/10 19:48, Mark Dickinson wrote:
This is a long way from my area of expertise (I'm commenting here because it was me who sent Colin here in the first place), and it's not clear to me whether this is a bug, and if it is a bug, how it could be resolved. What would the impact be of having the compiler produce 'LOAD_NAME' rather than 'LOAD_GLOBAL' here?
exec with a single argument = module namespace exec with two arguments = class namespace
Class namespaces are deliberately exempted from lexical scoping so that methods can't see class attributes, hence the example in the tracker issue works exactly as it would if the code was written as a class body.
class C: y = 3 def execfunc(): print y execfunc()
With this code, y would end up in C.__dict__ rather than the module globals (at least, it would if it wasn't for the exception) and the call to execfunc fails with a NameError when attempting to find y.
I know I've closed other bug reports that were based on the same misunderstanding, and I didn't understand it myself until Guido explained it to me a few years back, so suggestions for improving the exec documentation in this area would be appreciated.
Mark Dickinson wrote (with interactice prompts removed) code = """\ y = 3 def f(): return y . f() """ exec code in {} # works fine exec code in {}, {} # dies with a NameError Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in <module> File "<string>", line 3, in f NameError: global name 'y' is not defined I verified that 3.1 (with exec adjusted to be a function) operates the same. On 5/26/2010 8:51 AM, Nick Coghlan wrote:
exec with a single argument = module namespace exec with two arguments = class namespace
I verified in 3.1 that indenting 'code' and prepending 'class C():\n' gives the same error and that prepending 'def f():\n' now, with nexted function namespaces, does not give an error, although it would have been an error before Python 2.2, when there were no nested function namespaces. On 5/26/2010 10:03 AM, Colin H wrote:
Thanks for the details on why the observed behaviour occurs - very clear. My only query would be why this is considered correct? Why is it running as a class namespace, when it is not a class?
You are expecting that it run as a function namespace (with post 2.2 nesting), when it is not a function. Why is that any better?
Is there any reason why this is not considered a mistake? Slightly concerned that this is being considered not a bug because 'it is how it is'.
In original Python, the snippet would have given an error whether you thought of it as being in a class or function context, which is how anyone who knew Python then would have expected. Consistency is not a bug. When nested function namespaces were introduced, the behavior of exec was left unchanged. Backward compatibility is not a bug. A change could have been proposed for 3.x, but I do not remember such a discussion and expect it would have been rejected. One can get the nested function behavior by doing what I did in the test mentioned above. One could easily write a nested_exec function to do the wrapping automatically. ----- In http://bugs.python.org/issue8824 I suggest that "In all cases, if the optional parts are omitted, the code is executed in the current scope. If only globals is provided, it must be a dictionary, which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object." be followed by "If only globals is provided or if onedict is provided as both globals and locals, the code is executed in a new top-level scope. If different objects are given as globals and locals, the code is executed as if it were in a class statement in a new top-level scope." to make the behavior clearer. Terry Jan Reedy
The changes to the docs will definitely help in understanding why this behaves as it does. I would like like to take one last stab though at justifying why this behaviour isn't correct - will leave it alone if these arguments don't stack up :) Appreciate the input and discussion. Terry Jan Reedy wrote
You are expecting that it run as a function namespace (with post 2.2 nesting), when it is not a function. Why is that any better?
Because a class namespace (as I see it) was implemented to treat a specific situation - i.e. that functions in classes cannot see class variables. exec() is a far more generic instrument that has no such explicit requirement - i.e. it feels like hijacking an edge case to meet a requirement that doesn't exist. However 'all locals in an enclosing scope are made available in the function namespace' is generally understood as python's generic closure implementation, and would match more effectively the generic nature of the exec() statement. A litmus test for this sort of thing - if you polled 100 knowledgeable python devs who hadn't encountered this problem or this thread and asked if they would expect exec() to run as a class or function namespace, I think you'd struggle to get 1 of them to expect a class namespace. Functions are the more generic construct, and thus more appropriate for the generic nature of exec() (IMHO). It would appear that the only actual requirement not to make locals in an enclosing scope available in a nested function scope is for a class. The situation we are discussing seems have created a similar requirement for exec(), but with no reason.
In original Python, the snippet would have given an error whether you thought of it as being in a class or function context, which is how anyone who knew Python then would have expected. Consistency is not a bug.
When nested function namespaces were introduced, the behavior of exec was left unchanged. Backward compatibility is not a bug.
Generally, most other behaviour did change - locals in enclosing scopes *did* become available in the nested function namespace, which was not backward compatible. Why is a special case made to retain consistency and backward compatibility for code run using exec()? It's all python code. Inconsistent backward compatibility might be considered a bug. Cheers, Colin
On 27/05/10 06:07, Colin H wrote:
In original Python, the snippet would have given an error whether you thought of it as being in a class or function context, which is how anyone who knew Python then would have expected. Consistency is not a bug.
When nested function namespaces were introduced, the behavior of exec was left unchanged. Backward compatibility is not a bug.
Generally, most other behaviour did change - locals in enclosing scopes *did* become available in the nested function namespace, which was not backward compatible. Why is a special case made to retain consistency and backward compatibility for code run using exec()? It's all python code. Inconsistent backward compatibility might be considered a bug.
Because strings are opaque to the compiler. The lexical scoping has *no idea* what is inside the string, and the exec operation only has 3 things available to it: - the code object compiled from the string - the supplied globals namespace - the supplied locals namespace It isn't a special case, it's the only way it can possible work. Consider a more complex example: def get_exec_str(): y = 3 return "print(y)" exec(get_exec_str()) Should that code work? Or consider this one: def get_exec_str(): y = 3 return "print y" def run_exec_str(str_to_run): y = 5 exec(str_to_run) run_exec_str(get_exec_str()) Should that work? If yes, should it print 3 or 5? Lexical scoping only works for code that is compiled as part of a single operation - the separation between the compilation of the individual string and the code defining that string means that the symbol table analysis needed for lexical scoping can't cross the boundary. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 5:12 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 27/05/10 06:07, Colin H wrote:
In original Python, the snippet would have given an error whether you thought of it as being in a class or function context, which is how anyone who knew Python then would have expected. Consistency is not a bug.
When nested function namespaces were introduced, the behavior of exec was left unchanged. Backward compatibility is not a bug.
Generally, most other behaviour did change - locals in enclosing scopes *did* become available in the nested function namespace, which was not backward compatible. Why is a special case made to retain consistency and backward compatibility for code run using exec()? It's all python code. Inconsistent backward compatibility might be considered a bug.
Because strings are opaque to the compiler. The lexical scoping has *no idea* what is inside the string, and the exec operation only has 3 things available to it: - the code object compiled from the string - the supplied globals namespace - the supplied locals namespace
It isn't a special case, it's the only way it can possible work.
Consider a more complex example:
def get_exec_str(): y = 3 return "print(y)"
exec(get_exec_str())
Should that code work?
Or consider this one:
def get_exec_str(): y = 3 return "print y"
def run_exec_str(str_to_run): y = 5 exec(str_to_run)
run_exec_str(get_exec_str())
Should that work? If yes, should it print 3 or 5?
Lexical scoping only works for code that is compiled as part of a single operation - the separation between the compilation of the individual string and the code defining that string means that the symbol table analysis needed for lexical scoping can't cross the boundary.
Hi Nick, I don't think Colin was asking for such things. His use case is clarify by this post of his: On Wed, May 26, 2010 at 7:03 AM, Colin H <hawkett@gmail.com> wrote:
A really good reason why you would want to provide a separate locals dictionary is to get access to the stuff that was defined in the exec()'d code block. Unfortunately this use case is broken by the current behaviour. The only way to get the definitions from the exec()'d code block is to supply a single dictionary, and then try to weed out the definitions from amongst all the other globals, which is very difficult if you don't know in advance what was in the code block you exec()'d.
I think here's an example of what he's asking for def define_stuff(user_code): context = {'FOO': 42, 'BAR': 10**100} stuff = {} exec(user_code, context, stuff) return stuff In some other place, define_stuff() is called like this: user_code = """ EXTRA = 1.1 def func(): return FOO * BAR * EXTRA """ stuff = define_stuff(user_code) func = stuff['func'] print(func()) This can't find the EXTRA variable found. (Another example would be defining a recursive function -- the function can't find itself.) The alternative (which Colin complains about) is for define_stuff() to use a single namespace, initialized with the context, and to return that -- but he's right that in that case FOO and BAR would be exported as part of stuff, which may not be the intention. (Another solution would be to put FOO and BAR in the builtins dict -- but that has other problems of course.)
So put simply - the bug is that a class namespace is used, but its not a class.
Well, really, the bug is that no closure is created even though there are separate globals and locals. Unfortunately this is because while at the point where the code is *executed* the two different contexts are clearly present, at the point where it is *compiled* this is not the case -- the compiler normally uses syntactic clues to decide whether to generate code using closures, in particular, the presence of nested functions. Note that define_stuff() could be equivalently written like this, where it is more obvious that the compiler doesn't know about the separate namespaces: def define_stuff(user_code): context = {...} stuff = {} compiled_code = compile(user_code, "<string>", "exec") exec(user_code, context, stuff) return stuff This is not easy to fix. The best short-term work-around is probably a hack like this: def define_stuff(user_code): context = {...} stuff = {} stuff.update(context) exec(user_code, stuff) for key in context: if key in stuff and stuff[key] == context[key]: del stuff[key] return stuff -- --Guido van Rossum (python.org/~guido)
Hi Guido, Thanks for the possible workaround - unfortunately 'stuff' will contain a whole stack of things that are not in 'context', and were not defined in 'user_code' - things that python embeds - a (very small) selection - {..., 'NameError': <type 'exceptions.NameError'>, 'BytesWarning': <type 'exceptions.BytesWarning'>, 'dict': <type 'dict'>, 'input': <function input at 0x10047a9b0>, 'oct': <built-in function oct>, 'bin': <built-in function bin>, ...} It makes sense why this happens of course, but upon return, the globals dict is very large, and finding the stuff you defined in your user_code amongst it is a very difficult task. Avoiding this problem is the 'locals' use-case for me. Cheers, Colin On Thu, May 27, 2010 at 1:38 AM, Guido van Rossum <guido@python.org> wrote:
This is not easy to fix. The best short-term work-around is probably a hack like this:
def define_stuff(user_code): context = {...} stuff = {} stuff.update(context) exec(user_code, stuff) for key in context: if key in stuff and stuff[key] == context[key]: del stuff[key] return stuff
-- --Guido van Rossum (python.org/~guido)
On Wed, May 26, 2010 at 5:53 PM, Colin H <hawkett@gmail.com> wrote:
Thanks for the possible workaround - unfortunately 'stuff' will contain a whole stack of things that are not in 'context', and were not defined in 'user_code' - things that python embeds - a (very small) selection -
{..., 'NameError': <type 'exceptions.NameError'>, 'BytesWarning': <type 'exceptions.BytesWarning'>, 'dict': <type 'dict'>, 'input': <function input at 0x10047a9b0>, 'oct': <built-in function oct>, 'bin': <built-in function bin>, ...}
It makes sense why this happens of course, but upon return, the globals dict is very large, and finding the stuff you defined in your user_code amongst it is a very difficult task. Avoiding this problem is the 'locals' use-case for me. Cheers,
No, if taken literally that doesn't make sense. Those are builtins. I think you are mistaken that each of those (e.g. NameError) is in stuff -- they are in stuff['__builtins__'] which represents the built-in namespace. You should remove that key from stuff as well. --Guido
Colin
On Thu, May 27, 2010 at 1:38 AM, Guido van Rossum <guido@python.org> wrote:
This is not easy to fix. The best short-term work-around is probably a hack like this:
def define_stuff(user_code): context = {...} stuff = {} stuff.update(context) exec(user_code, stuff) for key in context: if key in stuff and stuff[key] == context[key]: del stuff[key] return stuff
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
Of course :) - I need to pay more attention. Your workaround should do the trick. It would make sense if locals could be used for this purpose, but the workaround doesn't add so much overhead in most situations. Thanks for the help, much appreciated, Colin On Thu, May 27, 2010 at 2:05 AM, Guido van Rossum <guido@python.org> wrote:
On Wed, May 26, 2010 at 5:53 PM, Colin H <hawkett@gmail.com> wrote:
Thanks for the possible workaround - unfortunately 'stuff' will contain a whole stack of things that are not in 'context', and were not defined in 'user_code' - things that python embeds - a (very small) selection -
{..., 'NameError': <type 'exceptions.NameError'>, 'BytesWarning': <type 'exceptions.BytesWarning'>, 'dict': <type 'dict'>, 'input': <function input at 0x10047a9b0>, 'oct': <built-in function oct>, 'bin': <built-in function bin>, ...}
It makes sense why this happens of course, but upon return, the globals dict is very large, and finding the stuff you defined in your user_code amongst it is a very difficult task. Avoiding this problem is the 'locals' use-case for me. Cheers,
No, if taken literally that doesn't make sense. Those are builtins. I think you are mistaken that each of those (e.g. NameError) is in stuff -- they are in stuff['__builtins__'] which represents the built-in namespace. You should remove that key from stuff as well.
--Guido
Colin
On Thu, May 27, 2010 at 1:38 AM, Guido van Rossum <guido@python.org> wrote:
This is not easy to fix. The best short-term work-around is probably a hack like this:
def define_stuff(user_code): context = {...} stuff = {} stuff.update(context) exec(user_code, stuff) for key in context: if key in stuff and stuff[key] == context[key]: del stuff[key] return stuff
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
I needed to make a small modification to the workaround - I wasn't able to delete from 'stuff', as the definitions in exec()'d code won't run - they're relying on that being present at runtime. In practice the overhead of doing this is quite noticeable if you run your code like this a lot, and build up a decent sized context (which I do). It will obviously depend on the usage scenario though. def define_stuff(user_code): context = {...} stuff = {} stuff.update(context) exec(user_code, stuff) return_stuff = {} return_stuff.update(stuff) del return_stuff['__builtins__'] for key in context: if key in return_stuff and return_stuff[key] == context[key]: del return_stuff[key] return return_stuff On Thu, May 27, 2010 at 2:13 AM, Colin H <hawkett@gmail.com> wrote:
Of course :) - I need to pay more attention. Your workaround should do the trick. It would make sense if locals could be used for this purpose, but the workaround doesn't add so much overhead in most situations. Thanks for the help, much appreciated,
Colin
On Thu, May 27, 2010 at 2:05 AM, Guido van Rossum <guido@python.org> wrote:
On Wed, May 26, 2010 at 5:53 PM, Colin H <hawkett@gmail.com> wrote:
Thanks for the possible workaround - unfortunately 'stuff' will contain a whole stack of things that are not in 'context', and were not defined in 'user_code' - things that python embeds - a (very small) selection -
{..., 'NameError': <type 'exceptions.NameError'>, 'BytesWarning': <type 'exceptions.BytesWarning'>, 'dict': <type 'dict'>, 'input': <function input at 0x10047a9b0>, 'oct': <built-in function oct>, 'bin': <built-in function bin>, ...}
It makes sense why this happens of course, but upon return, the globals dict is very large, and finding the stuff you defined in your user_code amongst it is a very difficult task. Avoiding this problem is the 'locals' use-case for me. Cheers,
No, if taken literally that doesn't make sense. Those are builtins. I think you are mistaken that each of those (e.g. NameError) is in stuff -- they are in stuff['__builtins__'] which represents the built-in namespace. You should remove that key from stuff as well.
--Guido
Colin
On Thu, May 27, 2010 at 1:38 AM, Guido van Rossum <guido@python.org> wrote:
This is not easy to fix. The best short-term work-around is probably a hack like this:
def define_stuff(user_code): context = {...} stuff = {} stuff.update(context) exec(user_code, stuff) for key in context: if key in stuff and stuff[key] == context[key]: del stuff[key] return stuff
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
On 5/27/2010 7:14 AM, Colin H wrote:
def define_stuff(user_code): context = {...} stuff = {} stuff.update(context)
exec(user_code, stuff)
return_stuff = {} return_stuff.update(stuff)
del return_stuff['__builtins__'] for key in context: if key in return_stuff and return_stuff[key] == context[key]: del return_stuff[key]
return return_stuff
I'm not sure your application, but I suspect you would benefit from using an identity check instead of an __eq__ check. The equality check may be expensive (e.g., a large dictionary), and I don't think it actually is checking what you want -- if the user_code generates an __eq__-similar dictionary, wouldn't you still want that? The only reason I can see to use __eq__ is if you are trying to detect user_code modifying an object passed in, which is something that wouldn't be addressed by your original complaint about exec (as in, modifying a global data structure). Instead of:
if key in return_stuff and return_stuff[key] == context[key]:
Use:
if key in return_stuff and return_stuff[key] is context[key]:
-- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
Yep fair call - was primarily modifying Guido's example to make the point about not being able to delete from the globals returned from exec - cheers, Colin On Thu, May 27, 2010 at 2:09 PM, Scott Dial <scott+python-dev@scottdial.com> wrote:
On 5/27/2010 7:14 AM, Colin H wrote:
def define_stuff(user_code): context = {...} stuff = {} stuff.update(context)
exec(user_code, stuff)
return_stuff = {} return_stuff.update(stuff)
del return_stuff['__builtins__'] for key in context: if key in return_stuff and return_stuff[key] == context[key]: del return_stuff[key]
return return_stuff
I'm not sure your application, but I suspect you would benefit from using an identity check instead of an __eq__ check. The equality check may be expensive (e.g., a large dictionary), and I don't think it actually is checking what you want -- if the user_code generates an __eq__-similar dictionary, wouldn't you still want that? The only reason I can see to use __eq__ is if you are trying to detect user_code modifying an object passed in, which is something that wouldn't be addressed by your original complaint about exec (as in, modifying a global data structure).
Instead of:
if key in return_stuff and return_stuff[key] == context[key]:
Use:
if key in return_stuff and return_stuff[key] is context[key]:
-- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
Just to put a couple of alternatives on the table that don't break existing code - not necessarily promoting them, or suggesting they would be easy to do - 1. modify exec() to take an optional third argument - 'scope_type' - if it is not supplied (but locals is), then it runs as class namespace - i.e. identical to existing behaviour. If it is supplied then it will run as whichever is specified, with function namespace being an option. The API already operates along these lines, with the second argument being optional and implying module namespace if it is not present. 2. a new API exec2() which uses function namespace, and deprecating the old exec() - assuming there is agreement that function namespace makes more sense than the class namespace, because there are real use cases, and developers would generally expect this behaviour when approaching the API for the first time.
Another approach to all this might be to generalise the mechanism by which a lookup of the globals falls back to a lookup of __builtins__. If this were done recursively, then the "stuff" could be attached to the globals dict, e.g. stuff['__builtins__'] = __builtins__ g = dict(__builtins__ = stuff) exec(code, g) del g['__builtins__'] -- Greg
On 27/05/10 12:38, Guido van Rossum wrote:
the compiler normally uses syntactic clues to decide whether to generate code using closures, in particular, the presence of nested functions.
Well, the compiler could be passed a flag indicating that the code is being compiled for an exec statement. -- Greg
On 27/05/10 10:38, Guido van Rossum wrote:
On Wed, May 26, 2010 at 5:12 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
Lexical scoping only works for code that is compiled as part of a single operation - the separation between the compilation of the individual string and the code defining that string means that the symbol table analysis needed for lexical scoping can't cross the boundary.
Hi Nick,
I don't think Colin was asking for such things.
Yes, I realised some time after sending that message that I'd gone off on a tangent unrelated to the original question (as a result of earlier parts of the discussion I'd been pondering the scoping differences between exec with two namespaces and a class definition and ended up writing about that instead of the topic Colin originally brought up). I suspect Thomas is right that the current two namespace exec behaviour is mostly a legacy of the standard scoping before nested scopes were added. To state the problem as succinctly as I can, the basic issue is that a code object which includes a function definition that refers to top level variables will execute correctly when the same namespace is used for both locals and globals (i.e. like module level code) but will fail when these namespaces are different (i.e. like code in class definition). So long as the code being executed doesn't define any functions that refer to top level variables in the executed code the two argument form is currently perfectly usable, so deprecating it would be an overreaction. However, attaining the (sensible) behaviour Colin is requesting when such top level variable references exist would actually be somewhat tricky. Considering Guido's suggestion to treat two argument exec like a function rather than a class and generate a closure with full lexical scoping a little further, I don't believe this could be done in exec itself without breaking code that expects the current behaviour. However, something along these lines could probably be managed as a new compilation mode for compile() (e.g. compile(code_str, name, "closure")), which would then allow these code objects to be passed to exec to get the desired behaviour. Compare and contrast:
def f(): ... x = 1 ... def g(): ... print x ... g() ... exec f.func_code in globals(), {} 1
source = """\ ... x = 1 ... def g(): ... print x ... g() ... """ exec source in globals(), {} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in <module> File "<string>", line 3, in g NameError: global name 'x' is not defined
Breaking out dis.dis on these examples is fairly enlightening, as they generate *very* different bytecode for the definition of g(). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Thu, May 27, 2010 at 11:42 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
However, attaining the (sensible) behaviour Colin is requesting when such top level variable references exist would actually be somewhat tricky. Considering Guido's suggestion to treat two argument exec like a function rather than a class and generate a closure with full lexical scoping a little further, I don't believe this could be done in exec itself without breaking code that expects the current behaviour.
Just to give a concrete example, here is code that would break if exec were to execute code in a function scope instead of a class scope: exec """ def len(xs): return -1 def foo(): return len([]) print foo() """ in globals(), {} Currently, the call to 'len' inside 'foo' skips the outer scope (because it's a class scope) and goes straight to globals and builtins. If it were switched to a local scope, a cell would be created for the broken definition of 'len', and the call would resolve to it. Honestly, to me, the fact that the above code ever worked (ie prints "0", not "-1") seems like a bug, so I wouldn't worry about backwards compatibility. Reid
Perhaps the next step is to re-open the issue? If it is seen as a bug, it would be great to see a fix in 2.6+ - a number of options which will not break backward compatibility have been put forward - cheers, Colin On Thu, May 27, 2010 at 9:05 PM, Reid Kleckner <rnk@mit.edu> wrote:
On Thu, May 27, 2010 at 11:42 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
However, attaining the (sensible) behaviour Colin is requesting when such top level variable references exist would actually be somewhat tricky. Considering Guido's suggestion to treat two argument exec like a function rather than a class and generate a closure with full lexical scoping a little further, I don't believe this could be done in exec itself without breaking code that expects the current behaviour.
Just to give a concrete example, here is code that would break if exec were to execute code in a function scope instead of a class scope:
exec """ def len(xs): return -1 def foo(): return len([]) print foo() """ in globals(), {}
Currently, the call to 'len' inside 'foo' skips the outer scope (because it's a class scope) and goes straight to globals and builtins. If it were switched to a local scope, a cell would be created for the broken definition of 'len', and the call would resolve to it.
Honestly, to me, the fact that the above code ever worked (ie prints "0", not "-1") seems like a bug, so I wouldn't worry about backwards compatibility.
Reid
On 29/05/10 20:20, Colin H wrote:
Perhaps the next step is to re-open the issue? If it is seen as a bug, it would be great to see a fix in 2.6+ - a number of options which will not break backward compatibility have been put forward - cheers,
A new feature request requesting a "closure" mode for compile() in 3.2 would probably be the best way forward. Once that is done, then the question of if or when to change the default behaviour for auto-compiled code in exec and/or dis can be considered. It definitely isn't a bug fix though - it's worked this way for years, and while the existing semantics can certainly be surprising, they're far from being buggy (as Thomas said, prior to the introduction of lexical scoping all Python namespaces worked this way). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 5/29/2010 6:20 AM, Colin H wrote:
Perhaps the next step is to re-open the issue? If it is seen as a bug, it would be great to see a fix in 2.6+ -
For the purpose of bugfix releases, a 'bug' is a discrepancy between doc and behavior. Every new feature is seen as a 'design bug' by someone.
a number of options which will not break backward compatibility have been put forward - cheers,
Code that uses a new x.y.z feature does not work in previous x.y versions. Problems with such micro-release additions lead to the current policy. The 3.2 feature addition deadline is about 5 months away. It will probably take 1 or more people at least a couple of months to write a PEP listing the rationale for a change, the options and possible pros and cons, possibly test one or more patches, solicit opinions on which is best, select one, write new test cases and docs, and get the final patch committed. Terry Jan Reedy
This option sounds very promising - seems right to do it at the compile stage - i.e. compile(code_str, name, "closure") as you have suggested. If there were any argument against, it would be that the most obvious behaviour (function namespace) is the hardest to induce, but the value in knowing you're not breaking anything is pretty high. Cheers, Colin On Thu, May 27, 2010 at 4:42 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 27/05/10 10:38, Guido van Rossum wrote:
On Wed, May 26, 2010 at 5:12 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
Lexical scoping only works for code that is compiled as part of a single operation - the separation between the compilation of the individual string and the code defining that string means that the symbol table analysis needed for lexical scoping can't cross the boundary.
Hi Nick,
I don't think Colin was asking for such things.
Yes, I realised some time after sending that message that I'd gone off on a tangent unrelated to the original question (as a result of earlier parts of the discussion I'd been pondering the scoping differences between exec with two namespaces and a class definition and ended up writing about that instead of the topic Colin originally brought up).
I suspect Thomas is right that the current two namespace exec behaviour is mostly a legacy of the standard scoping before nested scopes were added.
To state the problem as succinctly as I can, the basic issue is that a code object which includes a function definition that refers to top level variables will execute correctly when the same namespace is used for both locals and globals (i.e. like module level code) but will fail when these namespaces are different (i.e. like code in class definition).
So long as the code being executed doesn't define any functions that refer to top level variables in the executed code the two argument form is currently perfectly usable, so deprecating it would be an overreaction.
However, attaining the (sensible) behaviour Colin is requesting when such top level variable references exist would actually be somewhat tricky. Considering Guido's suggestion to treat two argument exec like a function rather than a class and generate a closure with full lexical scoping a little further, I don't believe this could be done in exec itself without breaking code that expects the current behaviour. However, something along these lines could probably be managed as a new compilation mode for compile() (e.g. compile(code_str, name, "closure")), which would then allow these code objects to be passed to exec to get the desired behaviour.
Compare and contrast:
def f(): ... x = 1 ... def g(): ... print x ... g() ... exec f.func_code in globals(), {} 1
source = """\ ... x = 1 ... def g(): ... print x ... g() ... """ exec source in globals(), {} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in <module> File "<string>", line 3, in g NameError: global name 'x' is not defined
Breaking out dis.dis on these examples is fairly enlightening, as they generate *very* different bytecode for the definition of g().
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
By hardest to induce I mean the default compile exec(code_str, {}, {}) would still be class namespace, but it's pretty insignificant. On Fri, May 28, 2010 at 12:32 AM, Colin H <hawkett@gmail.com> wrote:
This option sounds very promising - seems right to do it at the compile stage - i.e. compile(code_str, name, "closure") as you have suggested. If there were any argument against, it would be that the most obvious behaviour (function namespace) is the hardest to induce, but the value in knowing you're not breaking anything is pretty high.
Cheers, Colin
On Thu, May 27, 2010 at 4:42 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 27/05/10 10:38, Guido van Rossum wrote:
On Wed, May 26, 2010 at 5:12 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
Lexical scoping only works for code that is compiled as part of a single operation - the separation between the compilation of the individual string and the code defining that string means that the symbol table analysis needed for lexical scoping can't cross the boundary.
Hi Nick,
I don't think Colin was asking for such things.
Yes, I realised some time after sending that message that I'd gone off on a tangent unrelated to the original question (as a result of earlier parts of the discussion I'd been pondering the scoping differences between exec with two namespaces and a class definition and ended up writing about that instead of the topic Colin originally brought up).
I suspect Thomas is right that the current two namespace exec behaviour is mostly a legacy of the standard scoping before nested scopes were added.
To state the problem as succinctly as I can, the basic issue is that a code object which includes a function definition that refers to top level variables will execute correctly when the same namespace is used for both locals and globals (i.e. like module level code) but will fail when these namespaces are different (i.e. like code in class definition).
So long as the code being executed doesn't define any functions that refer to top level variables in the executed code the two argument form is currently perfectly usable, so deprecating it would be an overreaction.
However, attaining the (sensible) behaviour Colin is requesting when such top level variable references exist would actually be somewhat tricky. Considering Guido's suggestion to treat two argument exec like a function rather than a class and generate a closure with full lexical scoping a little further, I don't believe this could be done in exec itself without breaking code that expects the current behaviour. However, something along these lines could probably be managed as a new compilation mode for compile() (e.g. compile(code_str, name, "closure")), which would then allow these code objects to be passed to exec to get the desired behaviour.
Compare and contrast:
def f(): ... x = 1 ... def g(): ... print x ... g() ... exec f.func_code in globals(), {} 1
source = """\ ... x = 1 ... def g(): ... print x ... g() ... """ exec source in globals(), {} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in <module> File "<string>", line 3, in g NameError: global name 'x' is not defined
Breaking out dis.dis on these examples is fairly enlightening, as they generate *very* different bytecode for the definition of g().
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Mark Dickinson wrote:
code = """\
... y = 3 ... def f(): ... return y ... f() ... """
exec code in {} # works fine exec code in {}, {} # dies with a NameError
Seems to me the whole idea of being able to specify separate global and local scopes for top-level code is screwy in the first place. Are there any use cases for it? Maybe the second scope argument to exec() should be deprecated? -- Greg
On 27/05/2010 00:38, Greg Ewing wrote:
Mark Dickinson wrote:
code = """\
... y = 3 ... def f(): ... return y ... f() ... """
exec code in {} # works fine exec code in {}, {} # dies with a NameError
Seems to me the whole idea of being able to specify separate global and local scopes for top-level code is screwy in the first place. Are there any use cases for it? Maybe the second scope argument to exec() should be deprecated?
Sounds good to me, certainly ends the confusion over this undoubtedly unintuitive behaviour. :-) Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
Let me quickly jump in before someone actually deletes the feature. Nick Coghlan and Thomas Wouters had it right; there is still a use case. Don't kill it -- documenting it better is of course fine. It *might* be possible to add a closure to the definition of f in the case where globals != locals, but I doubt that that would be worth it, and there's probably some code out there that would actually break, so I'd say this is not a priority. --Guido On Wed, May 26, 2010 at 4:33 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 27/05/2010 00:38, Greg Ewing wrote:
Mark Dickinson wrote:
code = """\
... y = 3 ... def f(): ... return y ... f() ... """
exec code in {} # works fine exec code in {}, {} # dies with a NameError
Seems to me the whole idea of being able to specify separate global and local scopes for top-level code is screwy in the first place. Are there any use cases for it? Maybe the second scope argument to exec() should be deprecated?
Sounds good to me, certainly ends the confusion over this undoubtedly unintuitive behaviour. :-)
Michael
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog
READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On 27/05/10 11:33, Michael Foord wrote:
On 27/05/2010 00:38, Greg Ewing wrote:
Maybe the second scope argument to exec() should be deprecated?
Sounds good to me, certainly ends the confusion over this undoubtedly unintuitive behaviour. :-)
Although it's a fair point that it can be useful to have a way of capturing definitions made by the execed code, while still making an environment of other stuff available to it. So, I'd be in favour of changing the behaviour of exec so that the local scope is made visible inside functions. However, it would be non-trivial to implement this the way things are currently structured, which is probably one of the reasons it hasn't already been done. I don't think that simply using LOAD_NAME inside the function would work, because that would only look in the function's own local namespace first, then in the global one. There is just no mechanism available at the moment for the function to know about the local namespace passed in to exec. The way that functions get access to names in enclosing local scopes is by having them passed in as cells, but that mechanism is only available for optimised local namespaces, not ones implemented as dicts. -- Greg
On 27/05/10 13:13, Greg Ewing wrote:
The way that functions get access to names in enclosing local scopes is by having them passed in as cells, but that mechanism is only available for optimised local namespaces, not ones implemented as dicts.
I believe exec already includes the tapdancing needed to make that work. As Guido pointed out, it's the ability to generate closures directly from a source string that is currently missing. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
participants (10)
-
Colin H
-
Greg Ewing
-
Guido van Rossum
-
Mark Dickinson
-
Michael Foord
-
Nick Coghlan
-
Reid Kleckner
-
Scott Dial
-
Terry Reedy
-
Thomas Wouters