PyThread_acquire_lock freezes at pthread_cond_wait although lock not occupied

Tim Peters tim.one at comcast.net
Thu Feb 6 14:48:22 EST 2003


[Gernot Hillier]
> I'm now debugging a very nasty bug in a multithreadded program embedding
> Python for several days.

So you've barely got a start on it <wink>.

> After quite some work I found the following:
>
> One of the threads occasionally locks at
> PyThread_acquire_lock/pthread_cond_wait when trying to get the
> interpreter_lock or the import_lock. This thread will block there forever.
>
> But other threads may get the same lock w/o any problem at all as
> it seems.
> And when I look on it in gdb it looks even more astonishing:
>
> (gdb) bt
> #0  0x4026cea9 in sigsuspend () from /lib/libc.so.6
> #1  0x4003bd48 in __pthread_wait_for_restart_signal () from
>     /lib/libpthread.so.0

That's enough.  If the it never gets a restart signal, it will stay there
forever.  Whether it does get a restart signal is out of Python's hands,
though -- that's up to the pthreads implementation.

> ...
> Ok, it tries to get the global lock but:
>
> (gdb) print *((pthread_lock*) interpreter_lock)
> $7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
> __spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0,
> __m_count = 0,
> __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}
>
> So the lock is indeed not locked!! I don't understand this at all.

"The lock" is ambiguous.  There's the pthreads mutex ("mut" in the above),
and there's the Python GIL (implemented by that entire data structure).  As
Jeremy said, it's normal for mut to be unlocked during a condvar wait.
Whether the GIL is locked is really irrelevant, because your stack trace
shows that it's in the bowels of the platform condvar wait implementation,
presumably waiting for a signal it's never going to get.

> The same phenomenom I saw once when trying to get the import_lock.
>
> pthread_cond_wait () was blocked but import_lock_level was 0 and
> import_lock_thread was -1.
>
> Anybody seen anything like this?
>
> It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
> 2.2.1 but can't see that 2.2.2 will improve this somehow...

Trying Python 2.3a1 might.  The GIL under Linux is implemented via POSIX
semaphores in 2.3, instead of via a condvar+mutex pair.

> This is very important for me as the program will get my diploma
> thesis and if I couldn't get this problem solved in the next week,
> I'll get in real trouble. :-((

Then let me ask you an odd question:  are you using fork()?  If so, move
heaven and earth to get rid of it.  Over a year ago a number of people spent
more than a week trying to solve a similar problem on Linux, and never did
manage to solve it.  All the evidence pointed to a bug in the Linux pthreads
implementation, due to improper treatment of internal pthreads memory after
a fork.  Forking and threading mix like bananas and motor oil under the best
of conditions.

If you're not using fork(), I have no ideas other than to try a different
OS, or move to Python 2.3a1 and hope the same bug doesn't plague your
platform semaphore implementation.

all-oses-are-buggy-ly y'rs  - tim






More information about the Python-list mailing list