[Twisted-Python] spread.sturdy reconnect delay
currently, when spread.sturdy is unable to (re)connect (e.g. because the connection was refused/server not running), it retries at full speed.. without any delay.. implementing some kind of rate limiting is trivial, but i'm not sure about the right approach. the issue is current for me, as i'm using sturdy in my application and wouldn't want it to flood the network too much. i've come up with the following approaches: *) try n times, then wait i seconds, rinse, repeat *) try, sleep i, try, sleep i, ... *) try, sleep i++, try, sleep i++, .. suggestions?
On Fri, 2002-08-02 at 20:55, Paul Boehm wrote:
i've come up with the following approaches: *) try n times, then wait i seconds, rinse, repeat *) try, sleep i, try, sleep i, ... *) try, sleep i++, try, sleep i++, ..
I'd go with either the first or second, and since the second is simpler, probably the second. -- Chris Armstrong << radix@twistedmatrix.com >> http://twistedmatrix.com/users/carmstro.twistd/
On Sat, Aug 03, 2002 at 02:55:33AM +0200, Paul Boehm wrote:
currently, when spread.sturdy is unable to (re)connect (e.g. because the connection was refused/server not running), it retries at full speed.. without any delay..
implementing some kind of rate limiting is trivial, but i'm not sure about the right approach. the issue is current for me, as i'm using sturdy in my application and wouldn't want it to flood the network too much.
i've come up with the following approaches: *) try n times, then wait i seconds, rinse, repeat *) try, sleep i, try, sleep i, ... *) try, sleep i++, try, sleep i++, ..
What about exponential backoff... *) try, sleep i, i=i*2, repeat the i can grow very large very quickly. This approaches the heuristic of "it has been down for total of x seconds and it's still not up, so I'll wait x seconds more before trying again". ie assume the last failed attempt was in the middle of the total down time. A more explicit version of this with a tuneable factor would be; total=1 try, i=f*total, sleep i, total+=i, repeat or, taking into account the time taken trying; start=time() try, i=f*(time()-start), sleep i, repeat where 0<f (usually f<1), and indicates you should wait f fraction of the total time already waited. ie setting f=0.1 indicates you should wait 10% of the total time already waited. -- ---------------------------------------------------------------------- ABO: finger abo@minkirri.apana.org.au for more info, including pgp key ----------------------------------------------------------------------
On Mon, Aug 05, 2002 at 08:18:12AM +1000, Donovan Baarda wrote:
On Sat, Aug 03, 2002 at 02:55:33AM +0200, Paul Boehm wrote:
[snip] [snip]
What about exponential backoff...
*) try, sleep i, i=i*2, repeat
FWIW, this is essentially the approach I used in my mud client ;P It works fairly well. The one important difference is there's a cap on the wait time (I use 2^17 seconds, or ~36 minutes). I think this gives a good balance between getting reconnected quickly and not flooding the destination with connection attempts. Of course this pattern is pretty easily generalizable -- a factory with two parameters, the growth factor and the cap. Might it be a good idea to include a few of the simpler approaches somewhere? The code for most of these probably wouldn't be more than 5 lines, but having them all collected, pre-written, and of course bug-free would be pretty handy IMHO. Actually, now I'm thinking of some kind of policy objects instead of Factory subclasses - something that can be shared between factories or swapped around as desired without rebuilding your factories. Jp -- "Pascal is Pascal is Pascal is dog meat." -- M. Devine and P. Larson, Computer Science 340 -- 7:38pm up 75 days, 20:26, 6 users, load average: 0.14, 0.09, 0.03
participants (4)
-
abo@minkirri.apana.org.au
-
Christopher Armstrong
-
Jp Calderone
-
Paul Boehm