Hello community,

 

First of all - thanks for an awesome platform!  I’m brand new to this community, but have been using Twisted a couple years.

 

Reason for posting:

I’ve hit a condition with ReconnectingClientFactory that I’m not sure is per design.  I have a work around right now, but need your perspective.  Seems like there should be a better/right way to do this.

 

Attempted design:

I’d like to have long running TCP clients (forever until stopped), with a long running TCP server.  When a long running client hits a problem with a dependency (database is down, kafka bus unavailable, external API not responding, etc), I want the client to go offline for a while and then come back online… an automated, self-recovery type action.  Since it’s not ok to start/stop/restart the Twisted Reactor, I am letting the client finish whatever it can do, disconnect from the service, destruct the dependencies, wait for a period of time, and then attempt a clean re-initialization of those dependencies along with reconnecting to the Twisted Server.

 

Problem case:

I’m using the ReconnectingClientFactory in my client.  When the client hits a problem, it calls transport.loseConnection().  But whenever the client calls this, after the disconnect – it does not reconnect; stopFactory is called and everything exits.

 

Work around:

I noticed some Twisted source code that works off factory.numPorts.  If numPorts is 1 and the client loses the connection, it goes to 0 and calls the cleanup.  So I conditionally increase this number right before intentionally disconnecting, and then reset that after reconnecting.  This solves the problem, but it’s a hack. 

 

I’ll attach the test scripts to this post (if attachments are allowed), but the main code is with these functions in the factory:

 

                def clientConnectionLost(self, connector, reason):

                                print('  factory clientConnectionLost: reason: {}'.format(reason))

                                # if self.disconnectedOnPurpose:

                                #             ## Hack to keep reactor alive

                                #             print('  factory clientConnectionLost: increasing numPorts')

                                #             self.numPorts += 1

                                #             self.numPortsChanged = True

                                #             self.disconnectedOnPurpose = False

                                print('  ... simulate client going idle before attempting restart...')

                                time.sleep(5)

                                ReconnectingClientFactory.clientConnectionLost(self, connector, reason)

                                print('  factory clientConnectionLost: end.\n')

 

                def clientConnectionMade(self):

                                print('  factory clientConnectionMade: starting numPorts: {}'.format(self.numPorts))

                                # if self.numPortsChanged :

                                #             ## Resetting from hacked value

                                #             print('  factory clientConnectionMade: decreasing numPorts')

                                #             self.numPorts -= 1

                                #             self.numPortsChanged = False

                                print('  factory clientConnectionMade: finished numPorts: {}'.format(self.numPorts))

 

                def cleanup(self):

                                print('factory cleanup: calling loseConnection')

                                if self.connectedClient is not None:

                                                self.connectedClient.transport.loseConnection()

                                                self.disconnectedOnPurpose = True

 

With the above lines commented out, once the cleanup call does transport.loseConnection(), the factory stops at the end of clientConnectionLost.

 

 

Sample scripts/logs:

I’ve tried to create short test scripts and corresponding logs (with the client failing, and then with it restarting when I use the workaround).  I’ve cut out several thousand lines to get down to something simple for the example test scripts, but I know the client is still a little long.  Again, I’m not sure if attachments work on the mailing list, but I’ll attempt to attach the client/server scripts with the corresponding pass/fail logs.

 

Thanks!

 

-Chris