global interpreter lock not working as it should

Mark Day mday at apple.com
Fri Aug 2 14:00:43 EDT 2002


In article <ddc19db7.0208020724.7482f55 at posting.google.com>, Armin
Steinhoff <a-steinhoff at web.de> wrote:

> > A peculiar thing here is that "the Python lock" is an abstraction.  You said
> > you were running on Linux, and "the Python lock" isn't a lock there.  In
> > Pythons released to date, a Python lock under pthreads is a combination of a
> > phtreads mutex never held across more than a few C instructions, and a
> > pthreads condition variable.  When "the lock" is released, a condvar signal
> > is done to notify some other thread that it can grab the lock
> 
> The other threads can only grab the lock if they get the CPU ... but
> the releasing thread is still running at the same priority level!
> 
> >  This happens
>  before the relinquishing thread tries to reacquire the lock.
> 
> Sounds like scheduling magic ... this can only happen if the time
> slice of the releasing thread get exhausted before the thread can
> re-aquire the lock, which may be happen 10 times a year :)

I'm not familiar with the way Linux handles locks (of whatever kind
Python is using on Linux), but the threads waiting to acquire a lock
are usually on a queue, usually in the same order that the threads
tried to acquire the lock.  A common scheduling policy is not to allow
a thread to release and reacquire a lock if there are other threads
waiting for the lock (in the interest of fairness, an preventing
starvation).

Suppose you have two threads, A and B.  Suppose that thread A is
running and has acquired a lock.   Suppose B has not yet tried to
acquire that same lock yet, but will very shortly after it gets a
chance to run.  A typical sequence of events might be:

A releases and reacquires the lock several times since there is no
other contention for the lock, until its time slice is up.  Suppose its
time slice ends while A still holds the lock.

B is eligible and runs to the point of trying to acquire the lock. 
Since A already holds the lock, B becomes ineligible and is added to
the queue of threads waiting for the lock.

Since B is no longer eligible to run, the scheduler picks another
thread (even though B's time slice is not up yet).  Eventually, the
scheduler picks A.

A releases the lock.  When it tries to reacquire the lock, it finds
there is contention.  A is added to the end of the queue of threads
waiting for the lock.  The head of the queue is removed and the lock is
granted to that thread, which becomes eligible.  (If there are only two
threads competing for the lock, this is thread B.)  A is no longer
eligible to run, so the scheduler must pick another thread (even though
A's time slice is not up yet).

Some schedulers will choose to run thread B as soon as it becomes
eligible as a result of acquiring the lock (though this can lead to
starvation if there are other eligible threads).  Some may choose the
"next" thread of the same priority, or perhaps some other thread of
different priority.  I don't know how Linux does it.

To further complicate matters, some schedulers adjust thread priorities
dynamically based on various factors.

I suppose some schedulers might allow A to continue to release and
reacquire the lock even if there are other threads blocked waiting to
acquire the lock.  But I would expect any such scheduler to mark A or
the lock when A's time slice is up, so that A gets blocked and added to
the end of the queue the next time it tries to reacquire the lock (to
prevent starvation).

It seems like you think A is continuing to release and acquire the lock
even though B is waiting, and across arbitrarily many time slices
(leading to A starving out B).  Are you sure that's what's causing the
behavior you see?  If I remember your experimental evidence correctly,
A and B do continue to run back and forth, but you were surprised at
the amount of code (and time?) they managed to execute while running. 
Could it just be that the time slices are longer than you thought?  Or
instances of external non-thread-safe code taking a long time with the
lock held?

-Mark



More information about the Python-list mailing list