Portable way to tell if a process is still alive

Laszlo Nagy gandalf at shopzeus.com
Thu Jan 28 05:50:16 EST 2010


>> Suppose we have a program that writes its process id into a pid file. 
>> Usually the program deletes the pid file when it exists... But in some 
>> cases (for example, killed with kill -9 or TerminateProcess) pid file is 
>> left there. I would like to know if a process (given with its process 
>> id) is still running or not. I know that this can be done with OS 
>> specific calls. But that is not portable. It can also be done by 
>> executing "ps -p 23423" with subprocess module, but that is not portable 
>> either. Is there a portable way to do this?
>>
>> If not, would it be a good idea to implement this (I think very 
>> primitive) function in the os module?
>>     
>
> Not only is there no way to do it portably, there is no way to do it
> reliably for the general case. The problem is that processes do not have
> unique identifiers. A PID only uniquely identifies a running process; once
> the process terminates, its PID becomes available for re-use.
>   

Non-general case: the process is a service and only one instance should 
be running. There could be a pid file left on the disk. It is possible 
to e.g. mount procfs, and check if the given PID belongs to a command 
line / executed program that is in question. It cannot be guaranteed 
that a service will always delete its pid file when it exists. It 
happens for example, somebody kills it with "kill -9" or exits on signal 
11 etc. It actually did happened to me, and then the service could not 
be restarted because the PID file was there. (It is an error to run two 
instances of the same service, but it is also an error to not run 
it....) Whan I would like to do upon startup is to check if the process 
is already running. This way I could create a "guardian" that checks 
other services, and (re)starts them if they stopped working.

And no, it is not a solution to write "good" a service that will never 
stop, because:

1. It is particulary not possible in my case - there is a software bug 
in a third party library that causes my service exit on various wreid 
signals.
2. It is not possible anyway. There are users killing processes 
accidentally, and other unforeseen bugs.
3. In a mission critical environment, I would use a guardian even if 
guarded services are not likely to stop

I understand that this is a whole different question now, and possibly 
there is no portable way to do it.  Just I wonder if there are others 
facing a similar problem here. Any thoughs or comments - is it bad that 
I would like to achieve? Is there a better approach?

Thanks,

   Laszlo




More information about the Python-list mailing list