[Twisted-Python] Best way to trigger a future connection with data

Hi, I'm porting some code to twisted and got a little stuck. The current (non-twisted) code connects to a server, gives it some data (a job to do), gets a job-id and then terminates the connection. Sometime later, it reconnects and gets the output using the job-id. I've a factory and protocol set up, with buildProtocol over-riden in the Factory class to supply some extra data to each Protocol instance when it's created. This is fine when I submit a job as I can call a factory function to generate the required data (ie the job to do). My problem is how to initiate a connection at some point in the future AND pass it some specific info (ie the job-id) for it to pass to the server. I start the connections using task.callingLoop and reactor.connectTCP so my first guess was to add: task.callLater(time, reactor.connectTCP, ...) to the connectionLost method of my Protocol. However, I cannot see how to pass in any extra information (ie the job-id) this way? I know I can store state in the Factory, so I guess I just need to know how to pass this to a scheduled invocation of the Protocol. Is anyone able to point me in the right direction here? Cheers, -Nick. -- Nick Johnson, Applications Developer, EPCC 2407 JCMB, King's Buildings, Mayfield Road, Edinburgh, EH9 3JF e: Nick.Johnson@ed.ac.uk t: 0131 651 3388 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hi Nick, You're pretty much there already. Instantiate a ClientFactory that holds all the necessary state. By default, your protocol will have access to that state through its factory attribute (unless you override the Factory's buildProtocol method). cheers lvh

Thanks lvh, I did have to override the buildProtocol method in the Factory but I then set Protocol.factory to be equal to the Factory (ie, myprotocol.factory=self). I'm still stuck however with what to do when I get more complex than this simple case. For example, I use a callingLoop to call multiple connections with a 0.1 second interval to launch jobs and each of those connections does as mentioned by setting up a future connection to retrieve the output, say 10 seconds later. There is going to be some overlap, ie I might have launched 100 new jobs before the first one fires it's task.callLater. Storing state in the factory class doesn't work in this case because each new connection wont know whether to initiate a job or retrieve output as I cannot pass it this extra information. Cheers, -Nick. On 18/07/13 18:07, Laurens Van Houtven wrote:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

On Fri, Jul 19, 2013 at 10:19 AM, Nick Johnson <Nick.Johnson@ed.ac.uk>wrote:
You could (perhaps should) do this by calling ClientFactory.buildProtocol(self, addr).
I don't understand why not. Could you elaborate? Either way, it seems to me that the API should be: d = scheduleJob() d.addCallback(getJob) That is: only get the job once it has been scheduled. getJob would probably have to be split up into something that delays (consider delayLater), and something that actually gets the job. cheers lvh

Here's a cut-down version of the code which might be more illustrative: class MyProtocol(Protocol): def __init__(self, s, d): <some init> def dataReceived(self, data): <do stuff with data> def connectionLost(self, reason): reactor.callLater(10, ...) def connectionMade(self): self.transport.write(...) self.transport.loseWriteConnection() class MyFactory(ClientFactory): def __init__(self, src, dst, interval, type_req): self.s = src self.d = dst def buildProtocol(self, addr): p = MyProtocol(self.s, self.d) p.factory = self return p if __name__ == '__main__': f = MyFactory("10", "20", 1, 1) l = task.LoopingCall(reactor.connectTCP ... f) l.start(.1) task.callLater(20, reactor.stop()) reactor.run() So, each call from task.LoopingCall sets up a new connection which then starts a job. When that connection has finished, the protocol instance disappears. I can store the data it received in a structure in the Factory, no problems there. I have to call transport.loseWriteConnection() in order to get data from the server (I've no control over this). The problem comes when the delayed connection is started. This will (in my mind) create a new instance of MyProtocol by calling the buildProtocol method of the Factory. Without any additional input, it wont know what to do, start a new job or retrieve one from the server. I've tried thinking about callbacks and deferreds but still get stuck with the same problem of how to instruct a particular instance of MyProtocol to either launch or retrieve a job. Cheers, -Nick. On 19/07/13 09:25, Laurens Van Houtven wrote:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hi Nick, Okay, question and code review time. Why are source and destination arguments to the protocol? Can't they just access it on the factory? It seems that the factory initiates many connections with the same parameters. Is that true? Does it only ever make sense to use the factory to fire many requests? Anyway, the biggest issue seems to be that you're stuck on trying to do everything with one protocol; it might make total sense for you to have a job-queueing and a job-getting protocol :) Can you explain what the interval and type_req arguments are, and why they're passed to the factory? cheers lvh

Hi Nick, I was thinking something along these lines: https://gist.github.com/lvh/67c64042a2be06b7bf7a cheers lvh

Hi, Firstly, thanks for this gist, I had done a few experiments using endpoints and I think this is definitely the way to go for this code. As to the questions: source and destination are parameters for the job and might change between runs (a function I didn't include for brevity handles computation of these). Interval was to be the time passed to LoopingCall and type_req was another job parameter. I agree that, having looked at the gist, trying to pack everything into one Protocol was not the best way to go and using a separate protocol for each type of communication (ie, getjob, retrievejob) is more sensible. Thanks for helping me out with this, Twisted is slowly starting to make sense now. Cheers, -Nick. On 19/07/13 14:52, Laurens Van Houtven wrote:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hi Nick On Mon, Jul 22, 2013 at 10:40 AM, Nick Johnson <Nick.Johnson@ed.ac.uk>wrote:
Welcome :)
For what it's worth: a protocol implementing all of these might make even more sense if you have some functions as the high level API (like the ones I wrote in the gist): the functions could call high level methods on the protocol that cause it to do certain things. As an example, consider the IRCClient protocol: https://twistedmatrix.com/documents/current/api/twisted.words.protocols.irc.... ... which has methods like "join", "leave", "say", "message"...

Hi Nick, You're pretty much there already. Instantiate a ClientFactory that holds all the necessary state. By default, your protocol will have access to that state through its factory attribute (unless you override the Factory's buildProtocol method). cheers lvh

Thanks lvh, I did have to override the buildProtocol method in the Factory but I then set Protocol.factory to be equal to the Factory (ie, myprotocol.factory=self). I'm still stuck however with what to do when I get more complex than this simple case. For example, I use a callingLoop to call multiple connections with a 0.1 second interval to launch jobs and each of those connections does as mentioned by setting up a future connection to retrieve the output, say 10 seconds later. There is going to be some overlap, ie I might have launched 100 new jobs before the first one fires it's task.callLater. Storing state in the factory class doesn't work in this case because each new connection wont know whether to initiate a job or retrieve output as I cannot pass it this extra information. Cheers, -Nick. On 18/07/13 18:07, Laurens Van Houtven wrote:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

On Fri, Jul 19, 2013 at 10:19 AM, Nick Johnson <Nick.Johnson@ed.ac.uk>wrote:
You could (perhaps should) do this by calling ClientFactory.buildProtocol(self, addr).
I don't understand why not. Could you elaborate? Either way, it seems to me that the API should be: d = scheduleJob() d.addCallback(getJob) That is: only get the job once it has been scheduled. getJob would probably have to be split up into something that delays (consider delayLater), and something that actually gets the job. cheers lvh

Here's a cut-down version of the code which might be more illustrative: class MyProtocol(Protocol): def __init__(self, s, d): <some init> def dataReceived(self, data): <do stuff with data> def connectionLost(self, reason): reactor.callLater(10, ...) def connectionMade(self): self.transport.write(...) self.transport.loseWriteConnection() class MyFactory(ClientFactory): def __init__(self, src, dst, interval, type_req): self.s = src self.d = dst def buildProtocol(self, addr): p = MyProtocol(self.s, self.d) p.factory = self return p if __name__ == '__main__': f = MyFactory("10", "20", 1, 1) l = task.LoopingCall(reactor.connectTCP ... f) l.start(.1) task.callLater(20, reactor.stop()) reactor.run() So, each call from task.LoopingCall sets up a new connection which then starts a job. When that connection has finished, the protocol instance disappears. I can store the data it received in a structure in the Factory, no problems there. I have to call transport.loseWriteConnection() in order to get data from the server (I've no control over this). The problem comes when the delayed connection is started. This will (in my mind) create a new instance of MyProtocol by calling the buildProtocol method of the Factory. Without any additional input, it wont know what to do, start a new job or retrieve one from the server. I've tried thinking about callbacks and deferreds but still get stuck with the same problem of how to instruct a particular instance of MyProtocol to either launch or retrieve a job. Cheers, -Nick. On 19/07/13 09:25, Laurens Van Houtven wrote:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hi Nick, Okay, question and code review time. Why are source and destination arguments to the protocol? Can't they just access it on the factory? It seems that the factory initiates many connections with the same parameters. Is that true? Does it only ever make sense to use the factory to fire many requests? Anyway, the biggest issue seems to be that you're stuck on trying to do everything with one protocol; it might make total sense for you to have a job-queueing and a job-getting protocol :) Can you explain what the interval and type_req arguments are, and why they're passed to the factory? cheers lvh

Hi Nick, I was thinking something along these lines: https://gist.github.com/lvh/67c64042a2be06b7bf7a cheers lvh

Hi, Firstly, thanks for this gist, I had done a few experiments using endpoints and I think this is definitely the way to go for this code. As to the questions: source and destination are parameters for the job and might change between runs (a function I didn't include for brevity handles computation of these). Interval was to be the time passed to LoopingCall and type_req was another job parameter. I agree that, having looked at the gist, trying to pack everything into one Protocol was not the best way to go and using a separate protocol for each type of communication (ie, getjob, retrievejob) is more sensible. Thanks for helping me out with this, Twisted is slowly starting to make sense now. Cheers, -Nick. On 19/07/13 14:52, Laurens Van Houtven wrote:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hi Nick On Mon, Jul 22, 2013 at 10:40 AM, Nick Johnson <Nick.Johnson@ed.ac.uk>wrote:
Welcome :)
For what it's worth: a protocol implementing all of these might make even more sense if you have some functions as the high level API (like the ones I wrote in the gist): the functions could call high level methods on the protocol that cause it to do certain things. As an example, consider the IRCClient protocol: https://twistedmatrix.com/documents/current/api/twisted.words.protocols.irc.... ... which has methods like "join", "leave", "say", "message"...
participants (2)
-
Laurens Van Houtven
-
Nick Johnson