A bit of a boggle about subprocess.poll() and the codes it receives from a process

Fri Sep 9 13:32:19 EDT 2011

Hi,

I need a bit of help sorting this out...

I have a memory test script that is a bit of compiled C.  The test itself
can only ever return a 0 or 1 exit code, this is explicitly coded and there
are no other options.

I also have a wrapper test script that calls the C program that should also
only return 0 or 1 on completion.

The problem i'm encountering, however, involves the return code when
subprocess.poll() is called against the running memory test process.  The
current code in my wrapper program looks like this:

def run_processes(self, number, command):
        passed = True
        pipe = []
        for i in range(number):
            pipe.append(self._command(command))
            print "Started: process %u pid %u: %s" % (i, pipe[i].pid,
command)
        sys.stdout.flush()
        waiting = True
        while waiting:
            waiting = False
            for i in range(number):
                if pipe[i]:
                    line = pipe[i].communicate()[0]
                    if line and len(line) > 1:
                        print "process %u pid %u: %s" % (i, pipe[i].pid,
line)
                        sys.stdout.flush()
                    if pipe[i].poll() == -1:
                        waiting = True
                    else:
                        return_value = pipe[i].poll()
                        if return_value != 0:
                            print "Error: process  %u pid %u retuned %u" %
(i, pipe[i].pid, return_value)
                            passed = False
                        print "process %u pid %u returned success" % (i,
pipe[i].pid)
                        pipe[i] = None
        sys.stdout.flush()
        return passed

So what happens here is that in the waiting loop, if pipe[i].poll returns a
-1, we keep waiting, and then if it returns anything OTHER than -1, we exit
and return the return code.

BUT, I'm getting, in some cases, a return code of 127, which is impossible
to get from the memory test program.

The output from this bit of code looks like this in a failing situation:
Error: process 0 pid 2187 retuned 127
process 0 pid 2187 returned success
Error: process 1 pid 2188 retuned 127
process 1 pid 2188 returned success

I'm thinking that I'm hitting some sort of race here where the kernel is
reporting -1 while the process is running, then returns 127 or some other
status when the process is being killed and then finally 0 or 1 after the
process has completely closed out.  I "think" that the poll picks up this
intermediate exit status and immediately exits the loop, instead of waiting
for a 0 or 1.

I've got a modified version that I'm getting someone to test for me now that
changes

 if pipe[i].poll() == -1:
     waiting = True

to this

if pipe[i].poll() not in [0,1]:
    waiting = True

So my real question is: am I on the right track here, and am I correct in my
guess that the kernel is reporting different status codes to
subprocess.poll() during the shutdown of the polled process?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110909/1da21027/attachment.html>