[Python-Dev] Variant of removing GIL.

Sokolov Yura falcon at intercable.ru
Thu Sep 15 09:19:13 CEST 2005


Excuse my English.

I think I know how to remove GIL!!!! Obviously I am an idiot.

First about Py_INCREF and Py_DECREF.

We should not remove GIL at all. We should change it.

It must be "one writer-many reader" in a following semantic:

Lock has a "read-counter" and a "write-counter". Initially both are 0.

When "reader" tries to acquire lock for "read" it sleeps until 
"write-counter" is 0.
When he "reader" acquires lock, he increase "read-counter".
When "reader" releases lock, he decreases "read-counter".
One reader will not block other, since he not increases "write-counter".
Reader will sleep, if there is any waiting writers, since they are 
increase "write-counter".

When "writer" tries to acquire lock for "write", he increase 
"write-counter" and
sleeps until "read-counter" happens 0. For "writers" lock for "write" - 
simple lock.
when "writer" release lock, he decrease "write-counter".
When there is no waiting writers, readers arise.

Excuse me for telling obviouse things. I am really reinvent wheel in my 
head,
since I was a bad studient.

I think this kind of lock is native for linux (i saw it in a kernel 
source, but do not know
is waiting writer locks new readers or not?).

Now, every thread keep an queue of objects to decref. It can be 
implemented as array, cause
it will be freed at once.

Initially, every object acquires GIL for "read".
Py_INCREF works as usually,
Py_DECREF places a ref into a queue.
When queue has became full or "100" instruction left ( :-) , it usefull),
thread releases GIL for "read" and acquires for "write",
when he acquire it, he decrefs all objects stored in a queue and clear 
queue.
After all he acquires GIL for "read".


But what could we do with changing objects (dicts,lists and another)?

There should be a secondary "one-writer-many-reader" "public-write" GIL 
-  PWGIL.
SGIL ought to be more complicated, since it should work in RLOCK 
semantic for "write" lock.
Lets call this lock ROWMR(reentreed one writer - many reader)

So semantic for ROWMR can be:

When a thread acquires ROWMR lock, it acquires it at a "read" level.
Lets name it "write-level"=0.
While threads "write-level"=0 it is a "reader".
Thread can increase "write-level".
When he turns "write-level" from 0 to 1, he becomes "writer".
while "write-level">0, thread is writer.
Thread can decrease "write-level".
When "write-level" turns from 1 to 0, thread becomes "reader".

With PWGIL :
We can mark every _mutable_ object with a creator thread number.
If mark match current thread number, object is "private" for the thread.
If mark is 0 (or another imposible thread number) object is "public".
If mark !=0 and !=current thread number, object is "alien".
When we access _mutable_ object, we check is it "private"?
If it is, we can do anything without locking.
If it is not and we access for read, we check is it "public".
   If yes ("read" of "public"), then we can read it without locking.
   If no, we increase "write-level",
            if object is "alien", make it "public",
            if we need to change object, change it,
            decrease "write-level".
Of couse, when we append object to "public" collection, we chould make 
it "public",
 "write-level" is already  increased so we do not make many separate 
locks, but
when we then will access thouse object for read, we will not lock for 
make it "public".

I don't know, how nested scopes are implemented, but i think it should 
be considered as a mutable object.

So there is a small overhead for a single threaded application
( only for comparing 2 numbers)
 and in a big part of multithreaded, since we are locking only writting on
_mutable_ _public_ objects. Most part of "public" objects is not 
accessed to write
often: they are numbers, classes and mostly-read collections.
And one can optimize a program by accumulating results in a "private" 
collection
and then flush it to "public" one.
Also, there may be a statement for explicit increasing "write-level" 
around big update
of "public" object and decreasing after it.

PWGIL also must be released and reacquired with every "100" instructions 
left,but only if "write-level=0",
 it conforms to current GIL semantic.
I think, it must be not released with flushing decref queue, since it 
can happen while we are in C code.
And there must be strong think about blocking IO.

Mostly awful situation (at my point of view):
object O is "private" for a thread A.
thread B accesses O and try to mark it "public", so it locks in attempt 
of increasing "write-level"
thread A starts to change O (it is in "write-level 0"), and in a C code 
it releases PWGIL
          (around blocking IO, for example).
thread B becomes "writer", changes object to "public", becomes "reader" 
and starts to read O,
returning thread A continue to change O , remaining in a "write-level=0".

But, I think, well written C code should not attemt to make blocking IO 
inside of changing non-local objects
 (and it does not attempt at the moment, as I guess. Am I mistaken?). 
Or/and, when it returns and continues
to change O, it must check, is it "private" or it isn't?

I think, big part of checks and manipulation with GIL&PWGIL  could be 
hidden inside of current C API,
so we should not change a tons of libraries written in C. Only 
libraries, which create mutable objects which
 use notstandart containers for storing.

Maybe there should be only one united SGIL for incref-decref and "write 
public".

Summary overhead:
    each Py_DECREF place reference in a thread local queue (it could be 
small enough - about 1000, and not
             dinamic - just an array with counter).
    every object (mutable?) store thread mark (onle 4 byte, i think)
    every access to an object whould check - mutable is it? only if yes, 
'private' is it?
             and only for 'mutable public/alien' object we are locking.
 There would no more than 20% of perfomance overhead, i think.
 And +50% advantage in ordinary multithreated programm on dual processor 
box.
(Maybe +90% on 3 processor, +110 % on 4 processor, since write block 
will lock all readers).




More information about the Python-Dev mailing list