Missing SIGCHLD

Dan Stromberg drsalists at gmail.com
Tue Feb 15 19:28:13 CET 2011


On Tue, Feb 15, 2011 at 2:57 AM, Dinh <pcdinh at gmail.com> wrote:
> Hi,
>
> I currently build a process management system which is able to fork child
> processes (fork()) and keep them alive (waitpid() ).
>
>              if pid in self.current_workers:
>                  os.waitpid(pid, 0)
>
> If a child process dies, it should trigger a SIGCHLD signal and a handler is
> installed to catch the signal and start a new child process. The code is
> nothing special, just can be seen in any Python tutorial you can find on the
> net.
>
>             signal.signal(signal.SIGCHLD, self.restart_child_process)
>             signal.signal(signal.SIGHUP, self.handle) # reload
>             signal.signal(signal.SIGINT, self.handle)
>             signal.signal(signal.SIGTERM, self.handle)
>             signal.signal(signal.SIGQUIT, self.handle)
>
> However, this code does not always work as expected. Most of the time, it
> works. When a child process exits, the master process receives a SIGCHLD and
> restart_child_process() method is invoked automatically to start a new child
> process. But the problem is that sometimes, I know a child process exits due
> to an unexpected exception (via log file) but it seems that master process
> does not know about it. No SIGCHLD and so restart_child_process() is not
> triggered. Therefore, no new child process is forked.
>
> Could you please kindly tell me why this happens? Is there any special code
> that need being installed to ensure that every dead child will be informed
> correctly?
>
> Mac OSX 10.6
> Python 2.6.6

Hi Dinh.

I've done no Mac OS/X programming, but I've done Python and *ix
signals some - so I'm going to try to help you, but it'll be kind of
stabbing in the dark.

*ix signals have historically been rather unreliable and troublesome
when used heavily.

There are BSD signals, SysV signals, and POSIX signals - they all try
to solve the problems in different ways.  Oh, and Linux has a way of
doing signals using file descriptors that apparently helps quite a
bit.  I'm guessing your Mac will have available BSD and maybe POSIX
signals, but you might check on that.

You might try using ktrace on your Mac to see if any SIGCHLD signals
are getting lost (it definitely happens in some scenarios), and
hopefully, which kind of (C level) signal API CPython is using on your
Mac also.

You might also make sure your SIGCHLD signal handler is not just
waitpid'ing once per invocation, but rather doing a nonblocking
waitpid in a loop until no process is found, in case signals are lost
(especially if/when signals occur during signal handler processing).

If the loop in your signal handler doesn't help (enough), you could
also try using a nonblocking waitpid in a SIGALARM handler in addition
to your SIGCHLD handler.

Some signal API's want you to reenable the signal as your first action
in your signal handler to shorten a race window.  Hopefully Mac OS/X
doesn't need this, but you might check on it.

BTW, CPython signals and CPython threads don't play very nicely
together; if you're combining them, you might want to study up on
this.

Oh, also, signals in CPython will tend to cause system calls to return
without completing, and giving an EINTR in errno, and not all CPython
modules will understand what to do with that.  :(  Sadly, many
application programmers tend to ignore the EINTR possibility.

HTH



More information about the Python-list mailing list