Deadlock when interrupting interpreter initialisation with ptrace?

Hi there I hope you don't mind me sharing my experience with testing the austinp variant of Austin with Python >=2.7,<3.11. The austinp variant is a variant of Austin (https://github.com/P403n1x87/austin) for Linux that uses ptrace to seize and interrupt/continue threads to capture native stack traces using libunwind. During testing, I have discovered that there are good chances of causing what looks like a deadlock in Python if the seizing and interrupting of threads happen very early when spawning a Python subprocess from austinp. This seems to coincide with the initialisation of the interpreter when modules are being loaded. To avoid interfering so destructively with Python, I have added a sleep of about 0.5s on fork to prevent sampling during this initialisation phase, which has helped significantly. However, I think this poses one question: is this behaviour from Python to be expected or is it perhaps an indication of a potential bug? Whilst I find it conceivable that something like this could happen, given the locking that happens around imports, is it acceptable that the pausing and resuming of the execution of a thread lead to a potential deadlock? Cheers, Gabriele

Hi Gabriele, If everything you are doing is pausing and restarting, there should be no reason why this would interfere with anything more than if you are doing this at any other time other than the interpreter initialization. The only thing I can think of is that at this stage locking is much more common. The other thing that could be at play is that ptrace sends SIGSTOP on PTRACE_ATTACH but the signal cannot be captured by the interpreter (or any other process) so no signal handler should be at play either. Do you know what is involved in the deadlock (as in, what the threads are waiting on)? Answering your questions directly:
However, I think this poses one question: is this behaviour from Python to be expected or is it perhaps an indication of a potential bug?
is it acceptable that the pausing and resuming of the execution of a
Is not expected or unexpected because is not something we support. Is not also something we explicitly forbid either, is just that there is nothing in the design or the test suite that ensures that this will work. thread lead to a potential deadlock? It depends if this is something that we can control in a reasonable way or if this is outside our control. It may be a bug in our code in which case we can try to fix it, but without a more concrete pointer is going to be complicated, especially given that is more likely that this is outside our control. We probably will reject any proposal to add complexity to support this use case but we likely will be happy to do small changes if there is something small that we do that is preventing the use case. Cheers from cloudy London, Pablo Galindo Salgado On Mon, 6 Jun 2022 at 15:38, Gabriele <phoenix1987@gmail.com> wrote:
Hi there
I hope you don't mind me sharing my experience with testing the austinp variant of Austin with Python >=2.7,<3.11.
The austinp variant is a variant of Austin (https://github.com/P403n1x87/austin) for Linux that uses ptrace to seize and interrupt/continue threads to capture native stack traces using libunwind. During testing, I have discovered that there are good chances of causing what looks like a deadlock in Python if the seizing and interrupting of threads happen very early when spawning a Python subprocess from austinp. This seems to coincide with the initialisation of the interpreter when modules are being loaded. To avoid interfering so destructively with Python, I have added a sleep of about 0.5s on fork to prevent sampling during this initialisation phase, which has helped significantly.
However, I think this poses one question: is this behaviour from Python to be expected or is it perhaps an indication of a potential bug? Whilst I find it conceivable that something like this could happen, given the locking that happens around imports, is it acceptable that the pausing and resuming of the execution of a thread lead to a potential deadlock?
Cheers, Gabriele _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EWE5IK53... Code of Conduct: http://python.org/psf/codeofconduct/

On Mon, Jun 6, 2022 at 4:35 PM Gabriele <phoenix1987@gmail.com> wrote:
The austinp variant is a variant of Austin (https://github.com/P403n1x87/austin) for Linux that uses ptrace to seize and interrupt/continue threads to capture native stack traces using libunwind. During testing, I have discovered that there are good chances of causing what looks like a deadlock in Python if the seizing and interrupting of threads happen very early when spawning a Python subprocess from austinp.
Do you have a backtrace of the Python main thread when the hang happens? How do you spawn a new process? With the Python subprocess module? Victor -- Night gathers, and now my watch begins. It shall not end until my death.

Do you know what is involved in the deadlock (as in, what the threads are waiting on)?
I've found it hard to give an answer to this question. Because austinp is already tracing the interpreter, I cannot use, e.g., gdb to dump a backtrace. The event is also quite rare and it seems to happen before austinp has the chance to capture any samples. With the new support for 3.11 I might be able to see if I come across the same issue with the latest beta. I was hoping that the description of the issue could have rung a bell for anybody more familiar than me with all the locking going on during imports. The logs from austinp seem to suggest that the thread fails to resume after being interrupted, so something for me to explore is whether attempting to resume the thread more times before giving up is the actual solution in this case.
How do you spawn a new process?
I should have clarified that this is just a plain fork/exec from C: https://github.com/P403n1x87/austin/blob/e3d79ddc9f9737a791362e6962b5cac25a4... Cheers, Gabriele On Mon, 6 Jun 2022 at 16:30, Victor Stinner <vstinner@python.org> wrote:
On Mon, Jun 6, 2022 at 4:35 PM Gabriele <phoenix1987@gmail.com> wrote:
The austinp variant is a variant of Austin (https://github.com/P403n1x87/austin) for Linux that uses ptrace to seize and interrupt/continue threads to capture native stack traces using libunwind. During testing, I have discovered that there are good chances of causing what looks like a deadlock in Python if the seizing and interrupting of threads happen very early when spawning a Python subprocess from austinp.
Do you have a backtrace of the Python main thread when the hang happens? How do you spawn a new process? With the Python subprocess module?
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
-- "Egli è scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed altre figure geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola; senza questi è un aggirarsi vanamente per un oscuro laberinto." -- G. Galilei, Il saggiatore.

On 6 Jun 2022, at 17:52, Gabriele <phoenix1987@gmail.com> wrote:
I've found it hard to give an answer to this question. Because austinp is already tracing the interpreter, I cannot use, e.g., gdb to dump a backtrace.
Don't you have the backtrace from libunwind that you could save from austinp itself? Barry

Don't you have the backtrace from libunwind that you could save from austinp itself?
Unfortunately no as the "deadlock" happens before any samples have a chance to be collected. Upon further investigation, it seems that trying to resume a thread over and over when ptrace fails takes quite "some" time (in fact, more than I'd have hoped). Playing with a larger wait timeout (100 ms, but the largest number I've seen so far on my machine is 4 ms, which is still an eternity compared to a sensible sampling interval of 10 ms) seems to "cure" the problem, which I've only seen during interpreter initialisation. So perhaps Python itself is off the hook! On Mon, 6 Jun 2022 at 19:20, Barry Scott <barry@barrys-emacs.org> wrote:
On 6 Jun 2022, at 17:52, Gabriele <phoenix1987@gmail.com> wrote:
I've found it hard to give an answer to this question. Because austinp is already tracing the interpreter, I cannot use, e.g., gdb to dump a backtrace.
Don't you have the backtrace from libunwind that you could save from austinp itself?
Barry
-- "Egli è scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed altre figure geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola; senza questi è un aggirarsi vanamente per un oscuro laberinto." -- G. Galilei, Il saggiatore.
participants (4)
-
Barry Scott
-
Gabriele
-
Pablo Galindo Salgado
-
Victor Stinner