On 6/11/2020 6:59 AM, Mark Shannon wrote:
Different interpreters need to operate in their own isolated address space, or there will be horrible race conditions.
Regardless of whether that separation is done in software or hardware,
it has to be done.
I realize this is true now, but why must it always be true? Can't we fix this? At least one solution has been proposed: passing around a pointer to the current interpreter. I realize there issues here, like callbacks and signals that will need to be worked out. But I don't think it's axiomatically true that we'll always have race conditions with multiple interpreters in the same address space.
Eric
Axiomatically? No, but let me rise to the challenge.
If (1) interpreters manage the life-cycle of objects, and (2) a race condition arises when the life-cycle or state of an object is accessed by the interpreter that did not create it, and (3) an object will sometimes be passed to an interpreter that did not create it, and (4) an interpreter with a reference to an object will sometimes access its life-cycle or state, then (5) a race condition will sometimes arise. This seems to be true (as a deduction) if all the premises hold.
(1) and (2) are true in CPython as we know it. (3) is prevented (completely?) by the Python API, but not at all by the C API. (4) is implicit in an interpreter having access to an object, the way CPython and its extensions are written, so (5) follows in the case that the C API is used. You could change (1) and/or (2), maybe (4).
"Passing around a pointer to the current interpreter" sounds like an attempt to break (2) or maybe (4). But I don't understand "current". What you need at any time is the interpreter (state and life-cycle manager) for the object you're about to handle, so that the receiving interpreter can delegate the action, instead of crashing ahead itself. This suggests a reference to the interpreter must be embedded in each object, but it could be implicit in the memory address.
There is then still an issue that the owning interpreter has to
be thread-safe (if there are threads) in the sense that it can
serialise access to object state or life-cycle. If serialisation
is by a GIL, the receiving interpreter must take the GIL of the
owning interpreter, and we are somewhat back where we started.
Note that the "current interpreter" is not a function of the
current thread (or vice-versa). The current thread is running in
both interpreters, and by hypothesis, so are the competing
threads.
Can I just point out that, while most of this argument concerns a
particular implementation, we have a reason in Python (the
language) for an interpreter construct: it holds the current
module context, so that whenever code is executing, we can give
definite meaning to the 'import' statement. Here "current
interpreter" does have a meaning, and I suggest it needs to be
made a property of every function object as it is defined, and
picked up when the execution frame is created. This *may* help
with the other, internal, use of interpreter, for life-cycle and
state management, because it provides a recognisable point
(function call) where one may police object ownership, but that
isn't why you need it.
Jeff Allen