[Twisted-Python] [andrea@cpushare.com: Re: error after launching cpushare client]
Hello, a CPUShare user reported a failure in the process protocol processEneded method. The status parameter passed to the processEnded callback is like this: status.value.signal == 11 status.value.status == 139 My server code was validating the sigsegv status returned by the client, and it noticed it wasn't 11. But status.value.signal == 11. See the debugging patch that produced the below output: http://www.cpushare.com/hypermail/cpushare-discuss/05/08/0018.html How can it be that the same status has .value.signal == 11 and .value.status == 139 at the same time? I suspect this is a twisted bug. Thanks for any help! ----- Forwarded message from Andrea Arcangeli <andrea@cpushare.com> ----- Date: Wed, 3 Aug 2005 16:35:37 +0200 From: Andrea Arcangeli <andrea@cpushare.com> To: cpushare-discuss@cpushare.com Subject: Re: error after launching cpushare client On Wed, Aug 03, 2005 at 03:26:39PM +0200, Loïc Le Loarer wrote:
Le Wednesday 03 August 2005 à 13:09:35 +0200, Andrea Arcangeli a écrit:
Hi,
On Mon, Aug 01, 2005 at 06:36:13PM +0200, lll+cpushare@m4x.org wrote:
2005/08/01 17:48 CEST [cpushare_protocol,client] 'sigsegv not killed with signal 11: status 139.'
This is weird, it looks like twisted reports status 139 and signal 11 at the same time (while status should be 11 == signal), at first glance it looks a twisted bug.
Can you please try to apply this patch and see what it prints? (this way it will be printed on the client side, previously the number 139 is generated by the server side)
Hi,
I have applyed your patch and here is the output : 2005/08/03 15:23 CEST [-] Log opened. 2005/08/03 15:23 CEST [-] twistd 2.0.1 (/usr/bin/python 2.4.1) starting up 2005/08/03 15:23 CEST [-] reactor class: twisted.internet.selectreactor.SelectReactor 2005/08/03 15:23 CEST [-] Loading cpushare.tap... 2005/08/03 15:23 CEST [-] Loaded. 2005/08/03 15:23 CEST [-] Starting factory <cpushare.proto.cpushare_factory instance at 0xb78251ac> 2005/08/03 15:23 CEST [-] Enabling Multithreading. 2005/08/03 15:23 CEST [cpushare_protocol,client] Starting seccomp task 2005/08/03 15:23 CEST [-] Seccomp task gracefully killed by seccomp. 2005/08/03 15:23 CEST [cpushare_protocol,client] Starting seccomp task 2005/08/03 15:23 CEST [-] Seccomp task gracefully killed by sigsegv, status 139. 2005/08/03 15:23 CEST [cpushare_protocol,client] 'sigsegv not killed with signal 11: status 139.' 2005/08/03 15:23 CEST [cpushare_protocol,client] Lost connection. Reason: [Failure instance: Traceback (failure with no frames): twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost. 2005/08/03 15:23 CEST [cpushare_protocol,client] ] 2005/08/03 15:23 CEST [cpushare_protocol,client] <twisted.internet.ssl.Connector instance at 0xb783114c> will retry in 2 seconds 2005/08/03 15:23 CEST [cpushare_protocol,client] Stopping factory <cpushare.proto.cpushare_factory instance at 0xb78251ac> 2005/08/03 15:23 CEST [-] Main loop terminated. 2005/08/03 15:23 CEST [-] Server Shut Down.
The status is 139, does this help ?
Yes it helps since it verified that for some reason twisted reports an exit status of 139, which collides with an exist status of 11. (it wasn't a communication error between client and server) I think it's a twisted bug and not a mistake from my part. I'll ask on the twisted lists to be sure. What distro/twisted are you using? In the meantime this will work around it so you can start earning CPUCoins ;) Index: cpushare/proto.py =================================================================== RCS file: /home/andrea/crypto/cvs/cpushare/client/cpushare/cpushare/proto.py,v retrieving revision 1.49 diff -u -p -r1.49 proto.py --- cpushare/proto.py 6 Jul 2005 23:19:02 -0000 1.49 +++ cpushare/proto.py 3 Aug 2005 14:34:34 -0000 @@ -82,7 +82,7 @@ class state_machine(object): self.protocol.sendString(PROTO_SECCOMP_SUCCESS) def end_failure(failure): end_common() - self.protocol.sendString(PROTO_SECCOMP_FAILURE + struct.pack('!i', failure.value.status)) + self.protocol.sendString(PROTO_SECCOMP_FAILURE + struct.pack('!i', failure.value.signal)) def started(result): self.protocol.sendString(PROTO_SECCOMP_RUN) Thanks! ----- End forwarded message -----
Andrea Arcangeli wrote:
The status parameter passed to the processEnded callback is like this:
status.value.signal == 11 status.value.status == 139
My server code was validating the sigsegv status returned by the client, and it noticed it wasn't 11. But status.value.signal == 11.
See the debugging patch that produced the below output:
http://www.cpushare.com/hypermail/cpushare-discuss/05/08/0018.html
How can it be that the same status has .value.signal == 11 and .value.status == 139 at the same time?
I suspect this is a twisted bug.
.status is the _raw_ status. You probably mean .exitCode instead. Actual .status values are unportable, thus transferring them raw over the network is not a good idea.
On Wed, Aug 03, 2005 at 10:16:55PM +0300, Tommi Virtanen wrote:
.status is the _raw_ status.
You probably mean .exitCode instead.
Ok but then do you have an idea where the 139 comes from? I'd like to understand what's going on, to me that 139 number comes out of the blue.
Actual .status values are unportable, thus transferring them raw over the network is not a good idea.
status should be the same that waitpid returns, from the docs: "return a tuple containing its pid and exit status indication: a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced. Availability: Unix." Now I will ask to use exitCode but still I'd like to understand how status is connected with exitCode. BTW, I was using exitCode already, but I thought it would be set only if a signal wasn't delivered. Infact I wrote code like this: if status.value.exitCode or status.value.signal: if status.value.exitCode == 4: print 'Failure in setting the stack size to %d bytes.' % self.seccomp.stack if status.value.signal == signal.SIGKILL: print 'Seccomp task gracefully killed by seccomp.' elif status.value.signal == signal.SIGSEGV: print 'Seccomp task gracefully killed by sigsegv, status %r.' % status.value.status elif status.value.signal == signal.SIGQUIT: print 'Seccomp task killed by sigquit - should never happen.' self.d_end.errback(status) else: print 'Seccomp task completed successfully.' self.d_end.callback(None) (and in the above code status.value.signal == signal.SIGSEGV but status.value.status == 139 ;)
Andrea Arcangeli wrote:
Ok but then do you have an idea where the 139 comes from? I'd like to understand what's going on, to me that 139 number comes out of the blue.
139 == 128 + 11. One way to set up the numbering is that exit codes are 0..127, signals etc. have hight bit set. Naturally, all real access should go through the macros WIFEXITED etc, but that's how the number ranges are classically set up.
Actual .status values are unportable, thus transferring them raw over the network is not a good idea.
status should be the same that waitpid returns, from the docs:
"return a tuple containing its pid and exit status indication: a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced. Availability: Unix."
I think you are reading about os.wait and twisted is using os.waitpid. Otherwise, the document is lying to you. The status _may_ be laid out like that on _some_ platform, but unless python does some readjustment, the only portable way to access it is WIFEXITED and friends. cat >crash.c <<EOF int main(void) { /* comment out the next line if you want a normal exit */ *(char*)0 = 42; return 34; } EOF cat >run.py <<EOF #!/usr/bin/python import os pid = os.fork() if pid: # parent pid, status = os.waitpid(pid, 0) print pid, status if os.WIFEXITED(status): print 'exited', os.WEXITSTATUS(status) elif os.WIFSIGNALED(status): print 'signaled', os.WTERMSIG(status) print 'coredump', os.WCOREDUMP(status) elif os.WIFSTOPPED(status): print 'stopped', os.WSTOPSIG(status) elif os.WIFCONTINUED(status): print 'continued' else: print 'unknown' else: # child os.execv('./a.out', ['a.out']) raise RuntimeError, "exec failed" EOF chmod a+x run.py gcc -Wall crash.c ./run.py
Now I will ask to use exitCode but still I'd like to understand how status is connected with exitCode.
exitCode and signal are decoded from status.
BTW, I was using exitCode already, but I thought it would be set only if a signal wasn't delivered. Infact I wrote code like this:
Exactly. If a process exits due to a signal, there is no exit code in the sense of calling _exit(2).
On Thu, Aug 04, 2005 at 08:47:39AM +0300, Tommi Virtanen wrote:
Andrea Arcangeli wrote:
Ok but then do you have an idea where the 139 comes from? I'd like to understand what's going on, to me that 139 number comes out of the blue.
139 == 128 + 11.
One way to set up the numbering is that exit codes are 0..127, signals etc. have hight bit set.
Naturally, all real access should go through the macros WIFEXITED etc, but that's how the number ranges are classically set up.
Ah, I think I got why he gets 139, that's because the core dumping was enabled.
Actual .status values are unportable, thus transferring them raw over the network is not a good idea.
status should be the same that waitpid returns, from the docs:
"return a tuple containing its pid and exit status indication: a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced. Availability: Unix."
I think you are reading about os.wait and twisted is using os.waitpid. Otherwise, the document is lying to you. The status _may_ be
Yes, the status is the same in wait and waitpid, and it's the same as well in C (I doubt that python does any mangling of the C value).
laid out like that on _some_ platform, but unless python does some readjustment, the only portable way to access it is WIFEXITED and friends.
It's not like I've an huge portability, because seccomp currently only available on linux, but I'll follow your suggestion and I'll try to make it more portable.
cat >crash.c <<EOF int main(void) { /* comment out the next line if you want a normal exit */ *(char*)0 = 42; return 34; } EOF cat >run.py <<EOF #!/usr/bin/python import os
pid = os.fork() if pid: # parent pid, status = os.waitpid(pid, 0) print pid, status if os.WIFEXITED(status): print 'exited', os.WEXITSTATUS(status) elif os.WIFSIGNALED(status): print 'signaled', os.WTERMSIG(status) print 'coredump', os.WCOREDUMP(status) elif os.WIFSTOPPED(status): print 'stopped', os.WSTOPSIG(status) elif os.WIFCONTINUED(status): print 'continued' else: print 'unknown' else: # child os.execv('./a.out', ['a.out']) raise RuntimeError, "exec failed" EOF chmod a+x run.py gcc -Wall crash.c ./run.py
Ok thanks a lot for the example.
Exactly. If a process exits due to a signal, there is no exit code in the sense of calling _exit(2).
Ok, same as with C.
participants (2)
-
Andrea Arcangeli -
Tommi Virtanen