Hello community,

First of all - thanks for an awesome platform! I’m brand new to this community, but have been using Twisted a couple years.

Reason for posting:

I’ve hit a condition with ReconnectingClientFactory that I’m not sure is per design. I have a work around right now, but need your perspective. Seems like there should be a better/right way to do this.

Attempted design:

I’d like to have long running TCP clients (forever until stopped), with a long running TCP server. When a long running client hits a problem with a dependency (database is down, kafka bus unavailable, external API not responding, etc), I want the client to go offline for a while and then come back online… an automated, self-recovery type action. Since it’s not ok to start/stop/restart the Twisted Reactor, I am letting the client finish whatever it can do, disconnect from the service, destruct the dependencies, wait for a period of time, and then attempt a clean re-initialization of those dependencies along with reconnecting to the Twisted Server.

Problem case:

I’m using the ReconnectingClientFactory in my client. When the client hits a problem, it calls transport.loseConnection(). But whenever the client calls this, after the disconnect – it does not reconnect; stopFactory is called and everything exits.

Work around:

I noticed some Twisted source code that works off factory.numPorts. If numPorts is 1 and the client loses the connection, it goes to 0 and calls the cleanup. So I conditionally increase this number right before intentionally disconnecting, and then reset that after reconnecting. This solves the problem, but it’s a hack.

I’ll attach the test scripts to this post (if attachments are allowed), but the main code is with these functions in the factory:

def clientConnectionLost(self, connector, reason):

print(' factory clientConnectionLost: reason: {}'.format(reason))

# if self.disconnectedOnPurpose:

# ## Hack to keep reactor alive

# print(' factory clientConnectionLost: increasing numPorts')

# self.numPorts += 1

# self.numPortsChanged = True

# self.disconnectedOnPurpose = False

print(' ... simulate client going idle before attempting restart...')

time.sleep(5)

ReconnectingClientFactory.clientConnectionLost(self, connector, reason)

print(' factory clientConnectionLost: end.\n')

def clientConnectionMade(self):

print(' factory clientConnectionMade: starting numPorts: {}'.format(self.numPorts))

# if self.numPortsChanged :

# ## Resetting from hacked value

# print(' factory clientConnectionMade: decreasing numPorts')

# self.numPorts -= 1

# self.numPortsChanged = False

print(' factory clientConnectionMade: finished numPorts: {}'.format(self.numPorts))

def cleanup(self):

print('factory cleanup: calling loseConnection')

if self.connectedClient is not None:

self.connectedClient.transport.loseConnection()

self.disconnectedOnPurpose = True

With the above lines commented out, once the cleanup call does transport.loseConnection(), the factory stops at the end of clientConnectionLost.

Sample scripts/logs:

I’ve tried to create short test scripts and corresponding logs (with the client failing, and then with it restarting when I use the workaround). I’ve cut out several thousand lines to get down to something simple for the example test scripts, but I know the client is still a little long. Again, I’m not sure if attachments work on the mailing list, but I’ll attempt to attach the client/server scripts with the corresponding pass/fail logs.

Thanks!

-Chris