New GitHub issue #96387 from milizao:<br>
<hr>
<pre>
# Bug report
Main thread hang at `drop_gil` after daemon thread exited, my Python version is 3.9.2, but I think the same issue is exists in the Python 3.10 newest version.
```
(gdb) info threads
Id Target Id Frame
* 1 Thread 0xffff909dd010 (LWP 226769) "python" 0x0000ffff909355c0 in futex_wait_cancelable (private=<optimized out>, expected=0,
futex_word=0xaaaadf99e8d4 <_PyRuntime+524>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
(gdb) bt
#0 0x0000ffff909355c0 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xaaaadf99e8d4 <_PyRuntime+524>)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xaaaadf99e8d8 <_PyRuntime+528>, cond=0xaaaadf99e8a8 <_PyRuntime+480>) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0xaaaadf99e8a8 <_PyRuntime+480>, mutex=mutex@entry=0xaaaadf99e8d8 <_PyRuntime+528>) at pthread_cond_wait.c:638
#3 0x0000aaaadf35d184 in drop_gil (ceval=0xaaaadf99e820 <_PyRuntime+344>, ceval@entry=0xaaaadf300274 <PyObject_Hash+116>, ceval2=ceval2@entry=0xaaaafbc61ff0,
tstate=tstate@entry=0xaaaafbc64a00) at Python/ceval_gil.h:178
#4 0x0000aaaadf2c05c4 in eval_frame_handle_pending (tstate=0xaaaafbc64a00) at Python/ceval.c:882
#5 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xffff83929d40, throwflag=<optimized out>) at Python/ceval.c:1402
#6 0x0000aaaadf2bee1c in _PyEval_EvalFrame (throwflag=0, f=0xffff83929d40, tstate=0xaaaafbc64a00) at ./Include/internal/pycore_ceval.h:40
#7 function_code_fastcall (tstate=0xaaaafbc64a00, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
#8 0x0000aaaadf31ac04 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>,
tstate=<optimized out>) at ./Include/cpython/abstract.h:118
#9 _PyObject_CallNoArg (func=<optimized out>) at ./Include/cpython/abstract.h:172
#10 call_unbound_noarg (self=<optimized out>, func=0xffff83691ee0, unbound=<optimized out>) at Objects/typeobject.c:1515
#11 slot_tp_finalize (self=<optimized out>) at Objects/typeobject.c:6992
#12 0x0000aaaadf30011c in PyObject_CallFinalizer (self=self@entry=0xffff837ea6d0) at Objects/object.c:195
#13 0x0000aaaadf302694 in PyObject_CallFinalizerFromDealloc (self=self@entry=0xffff837ea6d0) at Objects/object.c:213
#14 0x0000aaaadf314b1c in subtype_dealloc (self=0xffff837ea6d0) at Objects/typeobject.c:1265
#15 0x0000aaaadf2f3f4c in _Py_DECREF (op=<optimized out>) at ./Include/object.h:430
#16 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:497
#17 insertdict (mp=mp@entry=0xffff8383de80, key=0xffff837450b0, hash=-5519671433175608536, value=value@entry=0xaaaadf94be68 <_Py_NoneStruct>)
at Objects/dictobject.c:1123
#18 0x0000aaaadf2f52d4 in PyDict_SetItem (op=op@entry=0xffff8383de80, key=<optimized out>, value=value@entry=0xaaaadf94be68 <_Py_NoneStruct>)
at Objects/dictobject.c:1573
#19 0x0000aaaadf2ffa48 in _PyModule_ClearDict (d=0xffff8383de80) at Objects/moduleobject.c:612
#20 0x0000aaaadf2ffc84 in _PyModule_Clear (m=m@entry=0xffff837f1a40) at Objects/moduleobject.c:560
#21 0x0000aaaadf3798e8 in _PyImport_Cleanup (tstate=tstate@entry=0xaaaafbc64a00) at Python/import.c:606
#22 0x0000aaaadf38e63c in Py_FinalizeEx () at Python/pylifecycle.c:1426
#23 0x0000aaaadf2c9110 in Py_RunMain () at Modules/main.c:679
#24 0x0000aaaadf2c99f8 in pymain_main (args=args@entry=0xffffd3754d88) at Modules/main.c:707
#25 0x0000aaaadf2c9ad4 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#26 0x0000ffff906c1f80 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=<optimized out>) at ../csu/libc-start.c:308
#27 0x0000aaaadf2c87bc in _start ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) py-bt
Traceback (most recent call first):
File "/home/ossadm/zhao/runtime.py", line 8, in __del__
def __del__(self):
(gdb)
```
### Rough analysis
In Python version 3.9.2, the `take_gil` is implemented as following:
```c
static void
take_gil(PyThreadState *tstate)
{
...
while (_Py_atomic_load_relaxed(&gil->locked)) {
unsigned long saved_switchnum = gil->switch_number;
unsigned long interval = (gil->interval >= 1 ? gil->interval : 1);
int timed_out = 0;
COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out);
/* If we timed out and no switch occurred in the meantime, it is time
to ask the GIL-holding thread to drop it. */
if (timed_out &&
_Py_atomic_load_relaxed(&gil->locked) &&
gil->switch_number == saved_switchnum)
{
if (tstate_must_exit(tstate)) {
MUTEX_UNLOCK(gil->mutex);
PyThread_exit_thread(); // [2]
}
assert(is_tstate_valid(tstate));
SET_GIL_DROP_REQUEST(interp); // [1]
}
}
...
}
```
A daemon thread may call `SET_GIL_DROP_REQUEST` and then exits from `PyThread_exit_thread`, so this thread will not get GIL really. This happens if the main thread is on the finalizing procedure and it calls `_PyRuntimeState_SetFinalizing` in `Py_FinalizeEx` before the daemon thread gets GIL (The daemon thread wait 5ms at most before exit).
After that, the flag `gil_drop_request` is set, but only the main thread is left. Then, if the main thread runs into `drop_gil` again, it will hang at `FORCE_SWITCHING`:
```c
COND_WAIT(gil->switch_cond, gil->switch_mutex);
```
### Reproducible example
This issue could be reproduced by:
```python
$ python -m http.server 5678 &
$ for ((i=0;;i++)); do echo $i; python -W all service.py; done
```
And the scripts used are:
runtime.py
```python
import socket
class TestUnclosedSocket:
def __init__(self):
self.s = socket.socket()
def __del__(self):
self.s.connect(('127.0.0.1', 5678))
ins = TestUnclosedSocket()
```
service.py
```python
import threading
import runtime
import sys
sys.setswitchinterval(0.001)
def calc():
sum = 0
for i in range(100000):
sum += i * i
def daemon_func():
while True:
calc()
if __name__ == '__main__':
threading.Thread(target=daemon_func, name="daemon", daemon=True).start()
calc()
```
I change GIL switch interval to 1ms to increase the probability here.
</pre>
<hr>
<a href="https://github.com/python/cpython/issues/96387">View on GitHub</a>
<p>Labels: type-bug</p>
<p>Assignee: </p>