[Twisted-Python] Missing feature in detecting stalled PB connections?
While using pb.PBClientFactory() with reactor.connectTCP(), I noticed that the specified timeout parameter is only for testing if there is no connection, not if the remote PB replies correctly. I noticed this when one of my remote PB instance was stuck. It was accepting requests, but did not answer anything at all. So the connection is established, and stays indefinitely! You can verify this by yourself by just putting netcat ("nc -vv -l -p 9999" for example) instead of a PB instance. The PB on the other side will connect, and then, nothing else. No timeout that netcat didn't reply (not even correctly). I know I could add a timer (callLater) that would wake up and kill the connection after some time (because I know one transaction will not exceed some precise time), but it's really really trashy, and one not always knows what will be the duration of the transaction. Is there really no other alternative to detect the correct connection to another PB instance? I attached an example code that would show this behaviour. Thanks, Luc -- Luc Stepniewski <luc.stepniewski@adelux.fr> <sip:724766@fwd.pulver.com> Adelux - Securite, Linux Public key: <http://lstep.free.fr/pubkey.txt> Key BC0E3C2A fingerprint = A4FA466C68D27E46B427 07D083ED6340BC0E3C2A
On Mon, 2006-02-20 at 15:02 +0100, Luc Stepniewski wrote:
I know I could add a timer (callLater) that would wake up and kill the connection after some time (because I know one transaction will not exceed some precise time), but it's really really trashy, and one not always knows what will be the duration of the transaction.
Is there really no other alternative to detect the correct connection to another PB instance?
The typical way to do this is have some sort of remote_ping method that returns immediately on the server (or on the client) that the client (or the server) calls every once in a while, with a timeout for the response. If that times out the problem is likely the connection, not the server being slow, in which case you can close the connection.
On Tuesday 21 February 2006 17:06, Itamar Shtull-Trauring wrote:
The typical way to do this is have some sort of remote_ping method that returns immediately on the server (or on the client) that the client (or the server) calls every once in a while, with a timeout for the response. If that times out the problem is likely the connection, not the server being slow, in which case you can close the connection.
Yes, it's the same thing as making a pseudo "timeout" with a callLater. But in my case, the problem is even more primitive, in that I just need to connect, send data (eventually get a response), and then disconnect. But if the remote PB server doesn't reply (the simplest reason would be that it's a program that accepts sockets but does not reply *anything*, like a listening netcat) then I'm stuck. Your answer will not help in that case (or maybe I misunderstood, as always, your explanation). Luc -- Luc Stepniewski <luc.stepniewski@adelux.fr> <sip:724766@fwd.pulver.com> Adelux - Securite, Linux Public key: <http://lstep.free.fr/pubkey.txt> Key BC0E3C2A fingerprint = A4FA466C68D27E46B427 07D083ED6340BC0E3C2A
On Tue, 2006-02-21 at 18:40 +0100, Luc Stepniewski wrote:
On Tuesday 21 February 2006 17:06, Itamar Shtull-Trauring wrote:
The typical way to do this is have some sort of remote_ping method that returns immediately on the server (or on the client) that the client (or the server) calls every once in a while, with a timeout for the response. If that times out the problem is likely the connection, not the server being slow, in which case you can close the connection.
Yes, it's the same thing as making a pseudo "timeout" with a callLater.
Not exactly. What I'm suggesting is a different command that you send in addition to your regular operations. This extra command, "ping", is expected to return a result quickly; if it doesn't you know something is wrong. That way even if your regular commands take a really long time for the server to process you can still tell if the server itself (or your connection to it) is ok. Every 10 seconds, say, you ping the server; if you don't get a response back in 5 seconds it's probably down.
On Tuesday 21 February 2006 09:58, Itamar Shtull-Trauring wrote:
On Tue, 2006-02-21 at 18:40 +0100, Luc Stepniewski wrote:
On Tuesday 21 February 2006 17:06, Itamar Shtull-Trauring wrote:
The typical way to do this is have some sort of remote_ping method that returns immediately on the server (or on the client) that the client (or the server) calls every once in a while, with a timeout for the response. If that times out the problem is likely the connection, not the server being slow, in which case you can close the connection.
Yes, it's the same thing as making a pseudo "timeout" with a callLater.
Not exactly. What I'm suggesting is a different command that you send in addition to your regular operations. This extra command, "ping", is expected to return a result quickly; if it doesn't you know something is wrong. That way even if your regular commands take a really long time for the server to process you can still tell if the server itself (or your connection to it) is ok. Every 10 seconds, say, you ping the server; if you don't get a response back in 5 seconds it's probably down.
That's actually what I'm doing with my application - although for a different reason. I send a "ping" that immediate is answered by a "pong" - if I don't get the pong withing 30 seconds I shut the connection down. My reason is a missconfigured firewall at one of my clients which drops forwarding after about 90 seconds idle time. Since I couldn't get their (incompetent) network admin to fix it I send a ping every 30 sec, so the firewall thinks the connection is active and doesn't drop it. Luc: if you need the code for that (although it's very simple), drop me a line. UC -- Open Source Solutions 4U, LLC 1618 Kelly St Phone: +1 707 568 3056 Santa Rosa, CA 95401 Cell: +1 650 302 2405 United States Fax: +1 707 568 6416
Hello, I tried to implement that timeout method. I think I did it corrrectly (example attached in this mail), but it seems that one can't disconnect() a PB if it hasn't replied yet :-( It looks like it is locked and waits for an answer from the remote PB. So if you run a netcat (nc -vv -l -p 9003) and run my code, timeoutHandler() is called but the factory is NOT disconnected :-( Did you do something different from my code? Thanks for your help, Luc On Wednesday 22 February 2006 04:12, Uwe C. Schroeder wrote:
On Tuesday 21 February 2006 09:58, Itamar Shtull-Trauring wrote:
On Tue, 2006-02-21 at 18:40 +0100, Luc Stepniewski wrote:
On Tuesday 21 February 2006 17:06, Itamar Shtull-Trauring wrote:
The typical way to do this is have some sort of remote_ping method that returns immediately on the server (or on the client) that the client (or the server) calls every once in a while, with a timeout for the response. If that times out the problem is likely the connection, not the server being slow, in which case you can close the connection.
Yes, it's the same thing as making a pseudo "timeout" with a callLater.
Not exactly. What I'm suggesting is a different command that you send in addition to your regular operations. This extra command, "ping", is expected to return a result quickly; if it doesn't you know something is wrong. That way even if your regular commands take a really long time for the server to process you can still tell if the server itself (or your connection to it) is ok. Every 10 seconds, say, you ping the server; if you don't get a response back in 5 seconds it's probably down.
That's actually what I'm doing with my application - although for a different reason. I send a "ping" that immediate is answered by a "pong" - if I don't get the pong withing 30 seconds I shut the connection down. My reason is a missconfigured firewall at one of my clients which drops forwarding after about 90 seconds idle time. Since I couldn't get their (incompetent) network admin to fix it I send a ping every 30 sec, so the firewall thinks the connection is active and doesn't drop it.
Luc: if you need the code for that (although it's very simple), drop me a line.
-- Luc Stepniewski <luc.stepniewski@adelux.fr> <sip:724766@fwd.pulver.com> Adelux - Securite, Linux Public key: <http://lstep.free.fr/pubkey.txt> Key BC0E3C2A fingerprint = A4FA466C68D27E46B427 07D083ED6340BC0E3C2A
participants (3)
-
Itamar Shtull-Trauring
-
Luc Stepniewski
-
Uwe C. Schroeder