[Python-Dev] Python VM

"Martin v. Löwis" martin at v.loewis.de
Tue Jul 22 00:17:56 CEST 2008


This looks fairly correct. A few comments below.

> Control Flow
> ============
> The calling sequence is:
> main() (in python.c) -> Py_Main() (main.c) -> PyRun_FooFlags() (pythonrun.c) ->
> run_bar() (pythonrun.c) -> PyEval_EvalCode() (ceval.c) -> PyEval_EvalCodeEx()
> (ceval.c) -> PyEval_EvalFrameEx() (ceval.c).

What this misses is the compiler stuff, i.e. PyParser_ASTFromFoo and
PyAST_Compile, which precedes the call to PyEval_ (atleast, no byte code
file is available).

> Threads
> =======
> PyEval_InitThreads() initializes the GIL (interpreter_lock) and sets
> main_thread to the (threading package dependent) ID of the current thread.
> Thread switching is done using PyThreadState_Swap(), which sets
> _PyThreadState_Current (both defined in pystate.c) and PyThreadState_GET()
> (an alias for _PyThreadState_Current) (pystate.h).

True, however, in most cases, this is triggered through
Py_BEGIN_ALLOW_THREADS, which passes NULL for the new thread. The actual
*switching* occurs by releasing the GIL, not by ThreadState_Swap.

Actually, Python doesn't dispatch threads at all. It just releases the
GIL, giving the operating system permission to wake up a different
thread - which the operating system may or may not chose to do. After
some time, the original thread will try to reacquire the GIL. Assuming
the OS applies fairness, it will not get it back if a different thread
was also waiting for it, so our thread will block - and *then* the OS
will dispatch (at latest).

> State
> =====
> The global state is recorded in a (per-process?) PyInterpreterState struct and
> a per-thread PyThreadState struct.

Yes and no. In principle, multiple interpreter states are supported per
process (and the current interpreter is identified by thread). However,
there are many limitations and quirks in the multiple-interpreter code.

> Each execution frame's state is contained in that frame's PyFrameObject
> (which includes the instruction stream, the environment (globals, locals,
> builtins, etc.), the value stack and so forth).
> EvalFrameEx()'s local variables are initialized from this frame object.

Not only. A lot of stuff also lives on the regular C stack, which exists
in parallel to the frame object stack (the latter being a spaghetti

> The instruction stream looks as follows (c.f. assemble_emit() in compile.c):

See also dis.py for the inverse operation.

> Basic structure
> ---------------
> EvalFrameEx() {

Somewhere you need to merge the thread-switching for threads that
have been executing a lot of instructions.

>   - Objects are transferred onto the value stack by GETITEM()'ing them from
>     consts or names, or by GETLOCAL()'ing them using oparg as an offset into
>     fastlocals (c.f. LOAD_* instructions).

Or, of course, as the result from some operation or function call, or
load from a global variable, or import, or ...


More information about the Python-Dev mailing list