[Twisted-Python] Unable to start twistd service on Ubuntu 10.04 when using pseudo terminal
To me this seems like a very strange problem and I've tried to debug it best I can but I don't know much about many of the components at work. Here's what happened: I've been happily using Capistrano to deploy many twistd services to Ubuntu 9.04 machines. It restarts them via an init.d script based off this template: http://twistedmatrix.com/trac/attachment/ticket/3434/twistdplugin. I'm running a self-installed Twisted 8.2.0 on both systems with some small tweaks to logging. On newer Ubuntu 10.04 machines I've been deploying to the twistd services don't restart properly. There are no errors in the log. The only log lines I'd get were: [-] Log opened. [-] twistd 8.2.0 (/usr/bin/python 2.6.5) starting up. [-] reactor class: twisted.internet.epollreactor.EPollReactor. But they were only failing to restart when run via Capistrano. I realized that the only difference was that Capistrano was using a pseudo terminal when it issued its commands. To recreate the problem without Capistrano I can run:
ssh -t user@host "/etc/init.d/service start"
But it works fine when I run:
ssh user@host "/etc/init.d/service start"
The 3 log lines I see in the failed case are written by twistd before it forks (clone() syscall) to create the daemon process. The strace dump of twistd when run with and without pseudo terminal were identical. I tried running strace with the -f option to get some info about the child process but it didn't fail in that case since I guess it doesn't actually fork. Running in a pseudo terminal with --nodaemon passed to twistd also works fine. I've tested this with multiple twistd services and am having the same issue with all of them. Any ideas? Did I completely misdiagnose the problem? Has there been some change in this area between Twisted 8.2.0 and 10.x? I'm certainly confused. :) - Garret
I've simplified this issue down and am able to reproduce it without Twisted: http://gist.github.com/603154 Still not sure what the cause is, so if anyone has any ideas I'd love to hear them. Thanks! On Mon, Sep 27, 2010 at 8:15 PM, Garret Heaton <powdahound@gmail.com> wrote:
To me this seems like a very strange problem and I've tried to debug it best I can but I don't know much about many of the components at work. Here's what happened:
I've been happily using Capistrano to deploy many twistd services to Ubuntu 9.04 machines. It restarts them via an init.d script based off this template: http://twistedmatrix.com/trac/attachment/ticket/3434/twistdplugin. I'm running a self-installed Twisted 8.2.0 on both systems with some small tweaks to logging.
On newer Ubuntu 10.04 machines I've been deploying to the twistd services don't restart properly. There are no errors in the log. The only log lines I'd get were: [-] Log opened. [-] twistd 8.2.0 (/usr/bin/python 2.6.5) starting up. [-] reactor class: twisted.internet.epollreactor.EPollReactor.
But they were only failing to restart when run via Capistrano. I realized that the only difference was that Capistrano was using a pseudo terminal when it issued its commands. To recreate the problem without Capistrano I can run:
ssh -t user@host "/etc/init.d/service start"
But it works fine when I run:
ssh user@host "/etc/init.d/service start"
The 3 log lines I see in the failed case are written by twistd before it forks (clone() syscall) to create the daemon process. The strace dump of twistd when run with and without pseudo terminal were identical. I tried running strace with the -f option to get some info about the child process but it didn't fail in that case since I guess it doesn't actually fork. Running in a pseudo terminal with --nodaemon passed to twistd also works fine.
I've tested this with multiple twistd services and am having the same issue with all of them.
Any ideas? Did I completely misdiagnose the problem? Has there been some change in this area between Twisted 8.2.0 and 10.x? I'm certainly confused. :)
- Garret
Hello, Garret Heaton wrote:
I've simplified this issue down and am able to reproduce it without Twisted: http://gist.github.com/603154
Still not sure what the cause is, so if anyone has any ideas I'd love to hear them. Thanks!
The behavior that you are seeing seems to be related to the change in the 2.6.32 kernel, where they changed the child-runs-first scheduler parameter to false. Setting that parameter to 1 with: $ sudo sysctl -w kernel.sched_child_runs_first=1 and rebooting the computer restores the behavior that you saw on the old kernel for me. See http://lwn.net/Articles/352863/ for more details. Parent (which is the controlling process when ran with -t) exiting before the child starts causes the child to receive SIGHUP signal immediately as it starts running, before it has time to disassociate itself from the parent's process group. It seems to me that this might be an actual bug in twistd, it should block the SIGHUP signal across the fork() calls. Your test program has an additional assumption that the child will run before the parent; the child in your program tries to write to stdout, i.e. the controlling terminal, which gets closed once the parent exits. The modified test program below works for me regardless of the setting of the kernel.sched_child_runs_first parameter. Hope this helps, Ziga import os import signal f = open("test_fork.out", "w") def daemonize(): # See http://www.faqs.org/faqs/unix-faq/programmer/faq/ - Section 1.7 print >> f, '--- %s: daemonizing' % os.getpid() signal.signal(signal.SIGHUP, signal.SIG_IGN) if os.fork(): # launch child and... print >> f, '--- %s: kill parent 1' % os.getpid() os._exit(0) # kill off parent print >> f, '--- %s: old sid: %r' % (os.getpid(), os.getsid(os.getpid())) os.setsid() print >> f, '--- %s: new sid: %r' % (os.getpid(), os.getsid(os.getpid())) if os.fork(): # launch child and... print >> f, '--- %s: kill parent 2' % os.getpid() os._exit(0) # kill off parent again. signal.signal(signal.SIGHUP, signal.SIG_DFL) print >> f, '--- %s: daemonizing done' % os.getpid() if __name__ == "__main__": print >> f, 'starting as %d' % os.getpid() daemonize() print >> f, 'stopping as %s' % os.getpid()
Thanks, that explains everything. The code I was using to test was copied from Twisted's daemonizing function: http://github.com/powdahound/twisted/blob/master/twisted/scripts/_twistd_uni... It should probably be updated before more users upgrade to the new kernel and run into this issue. Ticket #823 ( http://twistedmatrix.com/trac/ticket/823) seems very related to this. On Thu, Sep 30, 2010 at 8:52 AM, Žiga Seilnacht <ziga.seilnacht@gmail.com>wrote:
Hello,
Garret Heaton wrote:
I've simplified this issue down and am able to reproduce it without Twisted: http://gist.github.com/603154
Still not sure what the cause is, so if anyone has any ideas I'd love to hear them. Thanks!
The behavior that you are seeing seems to be related to the change in the 2.6.32 kernel, where they changed the child-runs-first scheduler parameter to false. Setting that parameter to 1 with:
$ sudo sysctl -w kernel.sched_child_runs_first=1
and rebooting the computer restores the behavior that you saw on the old kernel for me. See http://lwn.net/Articles/352863/ for more details.
Parent (which is the controlling process when ran with -t) exiting before the child starts causes the child to receive SIGHUP signal immediately as it starts running, before it has time to disassociate itself from the parent's process group. It seems to me that this might be an actual bug in twistd, it should block the SIGHUP signal across the fork() calls.
Your test program has an additional assumption that the child will run before the parent; the child in your program tries to write to stdout, i.e. the controlling terminal, which gets closed once the parent exits.
The modified test program below works for me regardless of the setting of the kernel.sched_child_runs_first parameter.
Hope this helps, Ziga
import os import signal
f = open("test_fork.out", "w")
def daemonize(): # See http://www.faqs.org/faqs/unix-faq/programmer/faq/ - Section 1.7 print >> f, '--- %s: daemonizing' % os.getpid()
signal.signal(signal.SIGHUP, signal.SIG_IGN) if os.fork(): # launch child and... print >> f, '--- %s: kill parent 1' % os.getpid() os._exit(0) # kill off parent print >> f, '--- %s: old sid: %r' % (os.getpid(), os.getsid(os.getpid())) os.setsid() print >> f, '--- %s: new sid: %r' % (os.getpid(), os.getsid(os.getpid())) if os.fork(): # launch child and... print >> f, '--- %s: kill parent 2' % os.getpid() os._exit(0) # kill off parent again.
signal.signal(signal.SIGHUP, signal.SIG_DFL) print >> f, '--- %s: daemonizing done' % os.getpid()
if __name__ == "__main__": print >> f, 'starting as %d' % os.getpid() daemonize() print >> f, 'stopping as %s' % os.getpid()
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (2)
-
Garret Heaton
-
Žiga Seilnacht