
On Mon, Mar 25, 2013 at 8:13 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com>wrote:
2013/3/25 anatoly techtonik <techtonik@gmail.com>
This module opened a Pandora box of Python internals. Version 0.4 still fails to trace files specified on command line, and I am lost in internal details of execfile + locals()/globals()/namespacing/scoping. Python tracker doesn't help here.
Seriously: always always pass your own locals and globals to functions like exec and execfile. Good luck.
I'd like to understand what's going on there, because it directly affects if I will be able to extend xtrace further and troubleshoot reported problems when more errors appear. I know that locals() is not an ordinary dictionary ( http://bugs.python.org/issue17546). And the first question - what should I pass to execfile as 'my own locals'? 1. an exact livedict that is returned by locals() function 2. my own dict, which will become live inside the execfile call 3. locals function that returns a reference to livedict The second question - why should I do this? To make the explanation easier, here is a real world problem that made me seek the answers. https://bitbucket.org/techtonik/xtrace/issue/3/unable-to-xtrace-indefpy The file that execfile() *inside* xtrace fails to execute is below. I named it indef.py, because subprocess call is inside defined function. import subprocess def ret(): print subprocess.check_output(['hg', 'id', '-nib']) ret() The problem repeats if the file is executed from console, but only in special case: Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] >>> execfile('indef.py') 0cef574f7f62+ 23+ default >>> def x(): ... execfile('indef.py') ... >>> x() 0cef574f7f62+ 23+ default Now I restart the console and do the same stuff, but without the first call: Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] >>> def x(): ... execfile('indef.py') ... >>> x() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in x File "indef.py", line 6, in <module> ret() File "indef.py", line 4, in ret print subprocess.check_output(['hg', 'id', '-nib']) NameError: global name 'subprocess' is not defined I guess the same stuff happens with my xtrace(), where execfile() is executed inside the function. Now the questions: 1. I understand that when Python starts executing code, it creates a namespace. This is the 'first_namespace' where Python starts to store all variables encountered. I intentionally do not call it global, as this term is not introduced at this point. Python continues to put and change variables there until it encounters [something], this [something] causes it to create a 'new_namespace' to store variables. I know that a function call makes Python create 'new_namespace' to store variables encountered inside this function, so "function call" is in [something]. What else is in [something], i.e. what else causes Python to create 'new_namespaces'? It will make me comfortable to see the full list described somewhere. 2. 'new_namespace's are created when function is entered and destroyed when it exits. When variable is requested, the lookup mechanism looks first inside 'new_namespace', then checks its parent (caller namespace), then parent of the parent and so on until the 'first_namespace'. Does it work like this? 3. Now the problem. When I run execfile() without parameters, I expect the code inside to be 'virtualized' - isolated from parent process, like in LXC or VirtualBox, but on Python level. I also expect that if I want to communicate with this code inside execfile(), I craft my own 'first_namespace' and pass it down. When execfile() returns, I inspect the state of this modified namespace, and maybe import some results back into my 'current_namespace'. The example with indef.py above shows the execfile() doesn't work this way. So the question is - why execfile() doesn't it work the way I described? 4. At this point I hope to reach the state there it is clear how execfile() works in reality, and the next question in this state - what was the reason to make it so complicated? Optimization? Thanks for the feedback. -- anatoly t.

2013/3/26 anatoly techtonik <techtonik@gmail.com>
3. Now the problem. When I run execfile() without parameters, I expect the code inside to be 'virtualized' - isolated from parent process, like in LXC or VirtualBox, but on Python level. I also expect that if I want to communicate with this code inside execfile(), I craft my own 'first_namespace' and pass it down. When execfile() returns, I inspect the state of this modified namespace, and maybe import some results back into my 'current_namespace'. The example with indef.py above shows the execfile() doesn't work this way. So the question is - why execfile() doesn't it work the way I described?
You should always pass locals and globals to execfile(). I warned you :-) No, the code is not isolated from the parent. When you call execfile only with a filename, globals is set tho the caller's globals, and locals are set to the caller's locals!
4. At this point I hope to reach the state there it is clear how execfile() works in reality, and the next question in this state - what was the reason to make it so complicated? Optimization?
OK, here is an explanation of what you probably call the "Python Execution Context": - All Python code runs with two namespaces: locals and globals. - When running a script, or when a module is imported, there is only one namespace: locals() is globals(). - When a function object is created "def...", the globals namespace is captured, and stored in the function object. When the function is executed, a new locals namespace is created, and filled with the function arguments. The function bytecode is executed with these locals; globals is the one captured above. - [Missing: the class statement] When running bytecode: - All assignments go into the locals namespace (this includes "x+=1", but also "import sys") - Name lookups first search locals, then globals. - Except inside a function scope ("def", or a lambda), where the compiler determines whether the name is a local variable (it has an assignment in the same scope), in this case only the locals are searched; otherwise only globals are searched. [For this discussion, it's only an implementation detail and makes no difference here] - [Also missing: nested scopes and cells] I think this explains all the weirdness you see: - "import subprocess" is an assignment, and will populate the local namespace. - when ret() sees the "subprocess" name, it will lookup the globals namespace. In a normal script/module, this makes no difference, because locals() is globals(). But with execfile(), globals and locals are passed from the parent frame: - when execfile() is called at module level, they are identical: locals() is globals() in the caller. - when execfile() is called inside a function, they are different! And boom. In other words, follow the advice above: namespace = {} execfile('indef.py', namespace) This will pass the same namespace for both globals and locals, and indef.py will be executed a bit like "import" or a standalone script. -- Amaury Forgeot d'Arc

3. Now the problem. When I run execfile() without parameters, I expect the code inside to be 'virtualized' - isolated from parent process, like in LXC or VirtualBox, but on Python level.
No, the code is not isolated from the parent. When you call execfile only with a filename, globals is set tho the caller's globals, and locals are set to the caller's locals!
Though this is just an educated guess, reading Anatoly's note and Amaury's response leads me to think that Anatoly might be expecting the execfile() function to work more or less like the exec* system calls on Unix systems. Execfile() is more akin to the import statement than the fork/exec paradigm. Skip

Amaury Forgeot d'Arc Le 26 mars 2013 20:27, "Skip Montanaro" <skip@pobox.com> a écrit :
3. Now the problem. When I run execfile() without parameters, I expect
the
code inside to be 'virtualized' - isolated from parent process, like in LXC or VirtualBox, but on Python level.
No, the code is not isolated from the parent. When you call execfile only with a filename, globals is set tho the caller's globals, and locals are set to the caller's locals!
Though this is just an educated guess, reading Anatoly's note and Amaury's response leads me to think that Anatoly might be expecting the execfile() function to work more or less like the exec* system calls on Unix systems. Execfile() is more akin to the import statement than the fork/exec paradigm.
But at least import runs code in a fresh namespace; execfile does not.
Skip

Hi all, Could you please move this debate somewhere else? pypy-dev is not the appropriate place where to discuss the Python language in general (but topics about differences between CPython and PyPy are fine of course). There is already for example http://mail.python.org/mailman/listinfo/python-list . A bientôt, Armin.

First things first. Thanks for replies. I am still crunching.. As I already received a similar question about if the topic is appropriate for PyPy in private email from other person, so I repeat my reply below as-is. Mind you the the discussion here is not "debate". When I say "I expect" - I of course mention my own problem of understanding, but the answer I need is not what should I do, but why it works differently? I am not familiar with exec* calls in Unix, but for me the analogy is misleading, and something like "LXC for Python code" may be a better analogy, because I don't want to replace the current process. What I am interested to know is what kind of code isolation execfile() supports and why it doesn't do full isolation by default? It looks like this is answered, but I need time to validate it. On Wed, Mar 27, 2013 at 1:00 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi all,
Could you please move this debate somewhere else? pypy-dev is not the appropriate place where to discuss the Python language in general (but topics about differences between CPython and PyPy are fine of course). There is already for example http://mail.python.org/mailman/listinfo/python-list .
It looks like python@ is more appropriate for basic questions about the usage of the existing language. My question is about internal mechanism that supports language design. Not only asking about existing behavior, but also about why this behavior can not be changed. I thought that people who are writing Python in Python are more skilled to answer such questions. In the end, I understood that PyPy project was created so that people who can't read C could still understand and experiment with their own language, no? -- anatoly t.

It looks like python@ is more appropriate for basic questions about the usage of the existing language. My question is about internal mechanism that supports language design. Not only asking about existing behavior, but also about why this behavior can not be changed. I thought that people who are writing Python in Python are more skilled to answer such questions. In the end, I understood that PyPy project was created so that people who can't read C could still understand and experiment with their own language, no?
We can explain to you how stuff works. Why not change it? Because python-dev said "no". Please take it with them, we don't deal with language design. People *can* experiment with the language whatever they like. If they find their experiments working, they can take python changes with python-dev, we'll not participate in this discussion. You're not asking how it's implemented though, you're asking why it works the way it works, and this is totally not for us to answer - we just copied cpython semantics and this is what we're going to keep. Please take this discussion somewhere else.
participants (5)
-
Amaury Forgeot d'Arc
-
anatoly techtonik
-
Armin Rigo
-
Maciej Fijalkowski
-
Skip Montanaro