Portable way to tell if a process is still alive
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Thu Jan 28 09:33:23 EST 2010
On 10:50 am, gandalf at shopzeus.com wrote:
>
>>>Suppose we have a program that writes its process id into a pid file.
>>>Usually the program deletes the pid file when it exists... But in
>>>some cases (for example, killed with kill -9 or TerminateProcess) pid
>>>file is left there. I would like to know if a process (given with its
>>>process id) is still running or not. I know that this can be done
>>>with OS specific calls. But that is not portable. It can also be done
>>>by executing "ps -p 23423" with subprocess module, but that is not
>>>portable either. Is there a portable way to do this?
>>>
>>>If not, would it be a good idea to implement this (I think very
>>>primitive) function in the os module?
>>
>>Not only is there no way to do it portably, there is no way to do it
>>reliably for the general case. The problem is that processes do not
>>have
>>unique identifiers. A PID only uniquely identifies a running process;
>>once
>>the process terminates, its PID becomes available for re-use.
>
>Non-general case: the process is a service and only one instance should
>be running. There could be a pid file left on the disk. It is possible
>to e.g. mount procfs, and check if the given PID belongs to a command
>line / executed program that is in question. It cannot be guaranteed
>that a service will always delete its pid file when it exists. It
>happens for example, somebody kills it with "kill -9" or exits on
>signal 11 etc. It actually did happened to me, and then the service
>could not be restarted because the PID file was there. (It is an error
>to run two instances of the same service, but it is also an error to
>not run it....) Whan I would like to do upon startup is to check if the
>process is already running. This way I could create a "guardian" that
>checks other services, and (re)starts them if they stopped working.
>
>And no, it is not a solution to write "good" a service that will never
>stop, because:
>
>1. It is particulary not possible in my case - there is a software bug
>in a third party library that causes my service exit on various wreid
>signals.
>2. It is not possible anyway. There are users killing processes
>accidentally, and other unforeseen bugs.
>3. In a mission critical environment, I would use a guardian even if
>guarded services are not likely to stop
>
>I understand that this is a whole different question now, and possibly
>there is no portable way to do it. Just I wonder if there are others
>facing a similar problem here. Any thoughs or comments - is it bad that
>I would like to achieve? Is there a better approach?
I've been pondering using a listening unix socket for this. As long as
the process is running, a client can connect to the unix socket. As
soon as the process isn't, no matter the cause, clients can no longer
connect to it.
A drawback of this approach in some cases is probably that the process
should be accepting these connections (and then dropping them). This
may not always be easy to add to an existing app.
Jean-Paul
More information about the Python-list
mailing list