Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

Wed Nov 8 11:18:12 EST 2006

Ross Ridge wrote:
> Ross Ridge schrieb:
>> So give an example where reference counting is unsafe.
> 
> Martin v. Löwis wrote:
>> Nobody claimed that, in that thread. Instead, the claim was
>> "Atomic increment and decrement instructions are not by themselves
>> sufficient to make reference counting safe."
> 
> So give an example of where atomic increment and decrement instructions
> are not by themselves sufficent to make reference counting safe.
> 
>> I did give an example, in <4550cc64$0$3462$9b622d9e at news.freenet.de>.
>> Even though f_name is reference-counted, it might happen that you get a
>> dangling pointer.
> 
> Your example is of how access to the "f_name" member is unsafe, not of
> how reference counting being unsafe.  The same sort of race condition
> can without reference counting being involved at all.  Consider the
> "f_fp" member: if one thread tries to use "printf()" on it while
> another thread calls "fclose()", then you can have same problem.  The
> race condition here doesn't happen because reference counting hasn't
> been made safe, nor does it happen because stdio isn't thread-safe.  It
> happens because accessing "f_fp" (without the GIL) is unsafe.
> 
> The problem your describing isn't that reference counting hasn't been
> made safe.  What you and Joe seem to be trying to say is that atomic
> increment and decrement instructions alone don't make accessing shared
> structure members safe.

Yes.

To recall the motivation and have a real world example: 
The idea to share/handover objects between 2 (N) well separated Python interpreter instances (free-threading) with 2 different GILs.
There is of course the usage condition: * Only one interpreter may access (read & write) the hot (tunneled) object tree at a time *
 (e.g. application: a numeric calc and when finished (flag) the other interpreter walks again the objects directly (without MPI/pickling))

But a key problem was, that the other thread can have old pointer objects (containers) pointing into the hot object tree - an there is shared use of (immutable) singleton objects (None,1,2...): The old pointer objects in other interpreter may disapear at any time or the pointers maybe be doubled.
Thus the refcount of hot object will count down/up out of the interpreter which has not possession of the hot object tree -   even if the pointers are not used for write/read access.

Only and truly if you have atomic Py_INCREF/Py_DECREF this is no problem. 

Before all interpreters have lost the object there will be no accidental disapearance of the object as Ross Ridge already pointed out. 
In addition concurrent read access to _constant_ objects/sub-trees would be possible, and also concurrent read&write access by using an explicit locking! 
Thus the original OP requirements would be fulfilled.

See so far only 5 main additional requirements to offer the possibility of separated GILs/free-threading interpreters:

* pointer to current GIL in threadstate and dynamic PyThreadState_GET() / currentthreadstate in TLS

* locks for global objects (file etc) of course, if they should be supported therefore. (I'd use the free-threading only for mere computations)

* enable the already existing obmalloc.c/LOCK&UNLOCK by something fast like:
_retry:
  __asm   LOCK INC malloc_lock
  if (malloc_lock!=1) { LOCK DEC malloc_lock; /*yield();*/ goto _retry; } 

* a special (LOCK INC) locking dict type for the global dict of extension modules
  (created clearly by Py_InitModule(name, methods) - thus that would also preserve backwards compatibility for extension C-code)

* nice tunnel functions to create extra interpreters and for actually tunneling the objects and maybe offering the fast locking-dict type to enable a fast sharing of the hot tunneled object tree.

Speed costs? Probably not much as far as the discussion in this thread sounds...

Of course this option of 2 interpreters - though easy to use - would still be for power use cases only: A Python programming bug doing accidential unlocked concurrent access into a hot tunneled tree can cause a C-level crash.  (This cannot happen so far with simple Python threading - you'd only get inconsistent data or a Python exception. But of course you can crash already now at C-level by using the Python standard library  :-) ). 
That danger would be ok for me so far. Conceptually its not more complicated that using locks right in normal Python thread programming - only the effect of bugs will more critical ...

If one thinks about overcoming the GIL at all - we are probably not far away. Mainly:

* make the locking dict type (and locking list type) the common case - the non-locking obsolete or for optimization only

* lock some other non-constant types which are not already mainly dicts/lists. most objects which' access-functions only change INTEGERS etc and call threadsafe C-lib functions etc don't require extra locking

Maybe the separated-GIL/interpreter-method can be a bridge to that. 
Refcounting probably doen't necessarily block that road.

Really interesting would be to make an experiment about the speed costs of LOCK INC. Guess the hot spot regarding recounting will be Py_None's cache line (on multi core CPUs with separated cache per core). 

Robert