[Twisted-Python] Really Basic clarification on defers
This is a really basic problem we are trying to decide about, We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection. Traffic coming in from port B is analysed and some subset is sent back to port A. Ignoring port A for the moment, just concentrating on port B, we have tried three options:-- 1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time. No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much 2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred. We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious 3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible. The conundrum we are trying to resolve now is which option should we use. Do any of the options have a built-in problem awaiting the unwary. In theory all 3 options work. But if No 1 works well enough for our volume of traffic should we adopt that one. Or is it better to start using the defertothread option. Is there a simple answer The traffic is not large, upto a 100-200 remote devices on port B. They will send GPS data every 20 secs, and about 500 messages of about 200 bytes average throught the day. The remote devices will respond in an irregular manner without dropping the connection, so we force a disconnectf if important messages are not getting through. They are then forced to reconnect. We have looked through the code searching for enlightment and it does seem to be well documented, but the information we are looking for comes well before the doc strings. Hopefully, someone can give us some pointers in the right direction. Thanks for any help. John Aherne
On Tue, Aug 4, 2009 at 10:08 AM, John Aherne
This is a really basic problem we are trying to decide about,
We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required
One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.
Traffic coming in from port B is analysed and some subset is sent back to port A.
Ignoring port A for the moment, just concentrating on port B, we have tried three options:--
1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time. No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much
2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred. We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious
3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.
The conundrum we are trying to resolve now is which option should we use. Do any of the options have a built-in problem awaiting the unwary. In theory all 3 options work. But if No 1 works well enough for our volume of traffic should we adopt that one. Or is it better to start using the defertothread option. Is there a simple answer
The traffic is not large, upto a 100-200 remote devices on port B. They will send GPS data every 20 secs, and about 500 messages of about 200 bytes average throught the day. The remote devices will respond in an irregular manner without dropping the connection, so we force a disconnectf if important messages are not getting through. They are then forced to reconnect.
We have looked through the code searching for enlightment and it does seem to be well documented, but the information we are looking for comes well before the doc strings.
Hopefully, someone can give us some pointers in the right direction.
Thanks for any help.
John Aherne
It seems to me that the volume of traffic you are dealing with isn't so high that you need to worry too much about direct sendline causing problems. If I were writing this from scratch based on my understanding of what you've written above, I would probably go with option 2. (Keep in mind, my understanding may be flawed...so...) However, if you've already got things working with option 1, and the added complexity isn't causing you any trouble, I don't see any real reason not to use that, since you've already got that working. Others may disagree... Option 3 seems totally unnecessary to me. I typically stay away from threads in Twisted unless I have a long running non-network process to deal with (disk access, db access, heavy math processing, etc.). Especially because of the relative "heaviness" of threads when using Python (due to complex interactions with the GIL), I would avoid this method...it will probably hurt performance more than Option 1 (though still probably not enough to matter). Others feel free to slap me if I'm giving bad advice :) Kevin Horn
Kevin
Thanks for the reply.
It's good to get some feedback on how someone else would go about tackling a
particular issue. It helps to confirm whether what you a retrying makes
sense.
Thanks
John Aherne
On Tue, Aug 4, 2009 at 7:35 PM, Kevin Horn
On Tue, Aug 4, 2009 at 10:08 AM, John Aherne
wrote: This is a really basic problem we are trying to decide about,
We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required
One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.
Traffic coming in from port B is analysed and some subset is sent back to port A.
Ignoring port A for the moment, just concentrating on port B, we have tried three options:--
1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time. No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much
2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred. We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious
3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.
The conundrum we are trying to resolve now is which option should we use. Do any of the options have a built-in problem awaiting the unwary. In theory all 3 options work. But if No 1 works well enough for our volume of traffic should we adopt that one. Or is it better to start using the defertothread option. Is there a simple answer
The traffic is not large, upto a 100-200 remote devices on port B. They will send GPS data every 20 secs, and about 500 messages of about 200 bytes average throught the day. The remote devices will respond in an irregular manner without dropping the connection, so we force a disconnectf if important messages are not getting through. They are then forced to reconnect.
We have looked through the code searching for enlightment and it does seem to be well documented, but the information we are looking for comes well before the doc strings.
Hopefully, someone can give us some pointers in the right direction.
Thanks for any help.
John Aherne
It seems to me that the volume of traffic you are dealing with isn't so high that you need to worry too much about direct sendline causing problems. If I were writing this from scratch based on my understanding of what you've written above, I would probably go with option 2. (Keep in mind, my understanding may be flawed...so...) However, if you've already got things working with option 1, and the added complexity isn't causing you any trouble, I don't see any real reason not to use that, since you've already got that working. Others may disagree...
Option 3 seems totally unnecessary to me. I typically stay away from threads in Twisted unless I have a long running non-network process to deal with (disk access, db access, heavy math processing, etc.). Especially because of the relative "heaviness" of threads when using Python (due to complex interactions with the GIL), I would avoid this method...it will probably hurt performance more than Option 1 (though still probably not enough to matter).
Others feel free to slap me if I'm giving bad advice :)
Kevin Horn
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Tue, Aug 4, 2009 at 10:08 AM, John Aherne
This is a really basic problem we are trying to decide about,
We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required
One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.
Traffic coming in from port B is analysed and some subset is sent back to port A.
Ignoring port A for the moment, just concentrating on port B, we have tried three options:--
1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time.
The reactor always schedules reads and writes "in its own good time", which means it writes whenever there's data to write and the socket is ready for writing. If you have data that can't be written at once, because it's too much for the socket to handle in a non-blocking fashion, the reactor (along with the transport) will take care of it, and defer its delivery itself, no need for any deferreds you'd had to care about here. Correct me if I'm wrong, but as I understand your description, option 1. and 2. do not behave identically. This is how I interpret it: option 1: A sends msg1 to [svc] : wrap msg1 in deferred1 [ - time - ] B sends data? to [svc] : 1. callback deferred1: [svc] sends msg1 to B 2. handle data? B sends rsp1 to [svc]: [svc] sends rsp1 to A option 2: A sends msg1 to [svc] : [svc] sends msg1 to B B sends rsp1 to [svc] : [svc] sends rsp1 to A If this is the case, you rely on some data? being sent to [svc] before msg1 can be forwarded to B. That means that you have msg1 in memory until you receive data? from B. This doesn't cause problems in your case, since you handle small messages in big intervals. But if you'd increase the load significantly, you'd also need significantly more RAM for no good reason. A case where option 1 might make sense would be if it depended on data? provided by B, to decide if or how to continue processing msg1. Then you had a valid use-case for deferreds. Since there are no such requirements, option 2 is definitely the right choice.
No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much
2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred.
There's absolutely no penalty (unless you allow the notion of negative penalties). Using sendline directly is faster than using a deferred in between, even if you don't count the memory overhead. I think there's a bit confusion about the role of deferreds in twisted here. Deferreds don't help you (or the reactor) with scheduling, they only provide you with a means to continue some processing after a certain event occurred.
We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious
3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.
Do you use sendline (the twisted api) from within the thread? If yes and it works, it works accidentally, probably also due to the very small load, and is definitely wrong (as well as unnecessary), twisted is not threadsafe, with the exception of a few methods/functions like callInThread/callFromThread/defertoThread etc. hope that helps, Johann
On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck
On Tue, Aug 4, 2009 at 10:08 AM, John Aherne
mailto:johnaherne@rocs.co.uk> wrote: This is a really basic problem we are trying to decide about,
We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required
One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.
Traffic coming in from port B is analysed and some subset is sent back to port A.
Ignoring port A for the moment, just concentrating on port B, we have tried three options:--
1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time.
The reactor always schedules reads and writes "in its own good time", which means it writes whenever there's data to write and the socket is ready for writing. If you have data that can't be written at once, because it's too much for the socket to handle in a non-blocking fashion, the reactor (along with the transport) will take care of it, and defer its delivery itself, no need for any deferreds you'd had to care about here.
Correct me if I'm wrong, but as I understand your description, option 1. and 2. do not behave identically. This is how I interpret it: option 1:
A sends msg1 to [svc] : wrap msg1 in deferred1 [ - time - ] B sends data? to [svc] : 1. callback deferred1: [svc] sends msg1 to B 2. handle data? B sends rsp1 to [svc]: [svc] sends rsp1 to A
option 2:
A sends msg1 to [svc] : [svc] sends msg1 to B B sends rsp1 to [svc] : [svc] sends rsp1 to A
If this is the case, you rely on some data? being sent to [svc] before msg1 can be forwarded to B. That means that you have msg1 in memory until you receive data? from B. This doesn't cause problems in your case, since you handle small messages in big intervals. But if you'd increase the load significantly, you'd also need significantly more RAM for no good reason. A case where option 1 might make sense would be if it depended on data? provided by B, to decide if or how to continue processing msg1. Then you had a valid use-case for deferreds. Since there are no such requirements, option 2 is definitely the right choice.
No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much
2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred.
There's absolutely no penalty (unless you allow the notion of negative penalties). Using sendline directly is faster than using a deferred in between, even if you don't count the memory overhead. I think there's a bit confusion about the role of deferreds in twisted here. Deferreds don't help you (or the reactor) with scheduling, they only provide you with a means to continue some processing after a certain event occurred.
We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious
3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.
Do you use sendline (the twisted api) from within the thread? If yes and it works, it works accidentally, probably also due to the very small load, and is definitely wrong (as well as unnecessary), twisted is not threadsafe, with the exception of a few methods/functions like callInThread/callFromThread/defertoThread etc.
hope that helps, Johann
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Johann Thanks for a very informative reply. Our problem was more basic than deferreds. It was to with the reactor and select command. Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly. Regarding messages being stored in memory, we only store the sequence number for each message so we can identify the message being responded to. We can then report back to A with the original message sequence number. So the memory footprint is quite small. Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that. Once again thanks for a very good response. That has cleared up a lot of confusion. I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said Regards John Aherne
On 09:33 am, johnaherne@rocs.co.uk wrote:
On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck
wrote: [snip]
Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly.
LineReceiver.sendLine is not blocking, correct. However, your statement implies that if it were blocking, you could use Deferreds to address this problem. This is incorrect. Deferreds do not make blocking APIs into non-blocking APIs.
[snip]
Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that.
deferToThread is not thread-safe: you may only call it from the reactor thread (the thread in which you called reactor.run). Since deferToThread runs the function you pass to it in a non-reactor thread, you may not use any non-thread-safe Twisted APIs in the function you pass to it.
Once again thanks for a very good response. That has cleared up a lot of confusion.
I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said
A significant effort is presently underway to improve the documentation about Deferreds. Any specific feedback you have about it would be much appreciated. :) Jean-Paul
On Wed, Aug 5, 2009 at 4:17 PM,
On 09:33 am, johnaherne@rocs.co.uk wrote:
On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck
wrote: [snip]
Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly.
LineReceiver.sendLine is not blocking, correct. However, your statement implies that if it were blocking, you could use Deferreds to address this problem. This is incorrect. Deferreds do not make blocking APIs into non-blocking APIs.
[snip]
Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that.
deferToThread is not thread-safe: you may only call it from the reactor thread (the thread in which you called reactor.run).
Since deferToThread runs the function you pass to it in a non-reactor thread, you may not use any non-thread-safe Twisted APIs in the function you pass to it.
Once again thanks for a very good response. That has cleared up a lot of confusion.
I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said
A significant effort is presently underway to improve the documentation about Deferreds. Any specific feedback you have about it would be much appreciated. :)
Jean-Paul
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Jean-Paul Thanks for the clarifications. I assume what I meant by blocking was I would have to put a function into deferToThread and add some callbacks to return the result. And not directly calling sendline from the thread. This is what Jarrod suggested in his reply. If I'm wrong, please say so. I have read the thread regarding deferreds with interest, but did not feel I knew enough to contribute. I do feel qualified to ask some very daft questions. Unfortunately, I don't see too many daft questions being asked in the list. I reckon if I need to know the answer to them then some other people probably do as well but don't put themselves forward. I see it as a way to document information that is difficult to come by. And I really do appreciate the very good answers I get However, after this little foray, I probably feel able to comment:-- The concept of deferreds is very simple. Everyone understands the concept - even I do. The issue is how and why and where you should use them in twisted. Some Basic getting Started Points. 1. For simple network activity do not use deferreds. They are not necessary. You can get a lot done without deferreds. And you don't know how to use them yet. The reactor and the select will process the outgoing and incoming buffers without blocking. Anyone familiar with networking and select will already understand this. Anyone not familiar will not realise it and needs to be made aware of how the select works. 2. If you have blocking code - please define blocking :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response. 3. John Goerzen in his Apress book Python Network Fundamentals has a very simple chat server example. With a few comments for the uninitiated, this would be a good starter. Possibly I could ask for permission to include it in some twisted HOWTO documentation for beginners with suitable copyright recognition. 4. With these few points as starters, maybe more people will be encouraged to get started with twisted. And if you know you can ignore deferreds until later you will find twisted is very simple to use and get some good results with little effort. 5. The emphasis on how deferreds work probably ought to be counterbalanced by some insight into how and why and where you would use them. For example, if you have a text file of 10000 lines you need to read in and summarize, presumably you would run this with deferToThread(+other options) and get the result via the callback. If someone has a better example please let me know. 6. Blocking code is always put into a thread or like, and a deferred callback or errback used to return the result or failure of the blocking code from the thread. See jarrod's response above You asked :) I'm giving I may be completely off track in what I have said above. And I would not want anyone to fall upon this mail and think it represents the gospel truth. Thank you for the response.. John Aherne
On Wed, Aug 5, 2009 at 6:04 PM, John Aherne
2. If you have blocking code - *please define blocking* :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.
blocking code is code that will block or may potentially block the continued execution of the main reactor thread. think for the most part long running processes or operations that may be long running. doing file or network i/o, calculating cpu intensive work, operations that may timeout like doing a remote call to another process or host machine, database operations are usually a culprit, that may be flooded with work or crashed, the examples go on but are mainly about i/o and cpu intensive operations. when these things happen on the reactor / main thread they block the server from doing anything else, it can't accept new connections, it can't do anything else until this blocking activity has completed and returned control to the reactor thread. you can handle this without deferToThread by breaking the blocking code up into smaller pieces sometimes. need to transfer a large file to a socket, instead of trying to send it all at once send 10KB at a time and yield back to the reactor and reschedule the next 10KB until finished, this will work, it might not be the fastest way and still may block for an unacceptable amount of time on just 10KB, depending on how heavily taxed the i/o system is at the moment. Usually deferToThread is just easier to implement.
On Thu, Aug 6, 2009 at 1:12 AM, Jarrod Roberson
On Wed, Aug 5, 2009 at 6:04 PM, John Aherne
wrote: 2. If you have blocking code - *please define blocking* :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.
blocking code is code that will block or may potentially block the continued execution of the main reactor thread. think for the most part long running processes or operations that may be long running. doing file or network i/o, calculating cpu intensive work, operations that may timeout like doing a remote call to another process or host machine, database operations are usually a culprit, that may be flooded with work or crashed, the examples go on but are mainly about i/o and cpu intensive operations. when these things happen on the reactor / main thread they block the server from doing anything else, it can't accept new connections, it can't do anything else until this blocking activity has completed and returned control to the reactor thread.
you can handle this without deferToThread by breaking the blocking code up into smaller pieces sometimes. need to transfer a large file to a socket, instead of trying to send it all at once send 10KB at a time and yield back to the reactor and reschedule the next 10KB until finished, this will work, it might not be the fastest way and still may block for an unacceptable amount of time on just 10KB, depending on how heavily taxed the i/o system is at the moment. Usually deferToThread is just easier to implement.
Jarrod,
Thanks. I've incorporated some of what you said into a reply to my own mail. A point I should have added to my other ones first time round. John Aherne
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
I'll try and put all my comments together with the feedback from everyone.
Then I can pass it over to the defer documentation thread to see if they are
interested in any part of it.
John
On Thu, Aug 6, 2009 at 8:02 AM, John Aherne
On Thu, Aug 6, 2009 at 1:12 AM, Jarrod Roberson
wrote: On Wed, Aug 5, 2009 at 6:04 PM, John Aherne
wrote: 2. If you have blocking code - *please define blocking* :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.
blocking code is code that will block or may potentially block the continued execution of the main reactor thread. think for the most part long running processes or operations that may be long running. doing file or network i/o, calculating cpu intensive work, operations that may timeout like doing a remote call to another process or host machine, database operations are usually a culprit, that may be flooded with work or crashed, the examples go on but are mainly about i/o and cpu intensive operations. when these things happen on the reactor / main thread they block the server from doing anything else, it can't accept new connections, it can't do anything else until this blocking activity has completed and returned control to the reactor thread.
you can handle this without deferToThread by breaking the blocking code up into smaller pieces sometimes. need to transfer a large file to a socket, instead of trying to send it all at once send 10KB at a time and yield back to the reactor and reschedule the next 10KB until finished, this will work, it might not be the fastest way and still may block for an unacceptable amount of time on just 10KB, depending on how heavily taxed the i/o system is at the moment. Usually deferToThread is just easier to implement.
Jarrod,
Thanks. I've incorporated some of what you said into a reply to my own mail. A point I should have added to my other ones first time round.
John Aherne
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
A point I missed out on:
The adbapi module seems to be a good example of using deferreds and threads.
The adbapi module returns a deferred it has created, you add your callbacks
to it. It calls your callback when ready. It does seem like the examplar for
doing deferreds.
The db stuff will normally block so put it in a thread and use deferreds to
get the result or failure.
A point about the db calls is that they can be very intensive. If you need
to run some db calls every 30 secs or 60 secs and the db takes 50% or more
of the time to generate the results, you won't have much time to service any
incoming requests to see the data results. The remote connections will be
failing bigtime.
So then I suppose you should break the code into 2 programs. One that does
the db stuff, the other to handle the remote connections. The db code when
it has a result will then connect to the other program and pass across its
results. There may be better ways of doing this of course.
So as Jarrod points out, deferToThread is an easy way of solving blocking
code, but not always. It seems good for short blocks, but you do need the
bulk of the time available for handling connections.
If anyone sees howlers in this, please let me know.
Regards
John Aherne
On Wed, Aug 5, 2009 at 11:04 PM, John Aherne
On Wed, Aug 5, 2009 at 4:17 PM,
wrote: On 09:33 am, johnaherne@rocs.co.uk wrote:
On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck
wrote: [snip]
Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly.
LineReceiver.sendLine is not blocking, correct. However, your statement implies that if it were blocking, you could use Deferreds to address this problem. This is incorrect. Deferreds do not make blocking APIs into non-blocking APIs.
[snip]
Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that.
deferToThread is not thread-safe: you may only call it from the reactor thread (the thread in which you called reactor.run).
Since deferToThread runs the function you pass to it in a non-reactor thread, you may not use any non-thread-safe Twisted APIs in the function you pass to it.
Once again thanks for a very good response. That has cleared up a lot of confusion.
I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said
A significant effort is presently underway to improve the documentation about Deferreds. Any specific feedback you have about it would be much appreciated. :)
Jean-Paul
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Jean-Paul
Thanks for the clarifications.
I assume what I meant by blocking was I would have to put a function into deferToThread and add some callbacks to return the result. And not directly calling sendline from the thread. This is what Jarrod suggested in his reply. If I'm wrong, please say so.
I have read the thread regarding deferreds with interest, but did not feel I knew enough to contribute.
I do feel qualified to ask some very daft questions. Unfortunately, I don't see too many daft questions being asked in the list. I reckon if I need to know the answer to them then some other people probably do as well but don't put themselves forward. I see it as a way to document information that is difficult to come by. And I really do appreciate the very good answers I get
However, after this little foray, I probably feel able to comment:--
The concept of deferreds is very simple. Everyone understands the concept - even I do. The issue is how and why and where you should use them in twisted.
Some Basic getting Started Points.
1. For simple network activity do not use deferreds. They are not necessary. You can get a lot done without deferreds. And you don't know how to use them yet. The reactor and the select will process the outgoing and incoming buffers without blocking. Anyone familiar with networking and select will already understand this. Anyone not familiar will not realise it and needs to be made aware of how the select works.
2. If you have blocking code - please define blocking :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.
3. John Goerzen in his Apress book Python Network Fundamentals has a very simple chat server example. With a few comments for the uninitiated, this would be a good starter. Possibly I could ask for permission to include it in some twisted HOWTO documentation for beginners with suitable copyright recognition.
4. With these few points as starters, maybe more people will be encouraged to get started with twisted. And if you know you can ignore deferreds until later you will find twisted is very simple to use and get some good results with little effort.
5. The emphasis on how deferreds work probably ought to be counterbalanced by some insight into how and why and where you would use them. For example, if you have a text file of 10000 lines you need to read in and summarize, presumably you would run this with deferToThread(+other options) and get the result via the callback. If someone has a better example please let me know.
6. Blocking code is always put into a thread or like, and a deferred callback or errback used to return the result or failure of the blocking code from the thread. See jarrod's response above
You asked :) I'm giving
I may be completely off track in what I have said above. And I would not want anyone to fall upon this mail and think it represents the gospel truth.
Thank you for the response..
John Aherne
A note on how I handled a similar situation in regards to the timeout
requirements you seem to have:
I had a similar setup where I was forwarding data from clients to servers
and back and forth, etc. I wanted to timeout the connection after some idle
time and ended up using the TimeoutMixin found in
twisted.protocols.policies. It probably does exactly what you are doing but
via a simple class inheritance and variable set.
-ab
On Tue, Aug 4, 2009 at 2:35 PM, Kevin Horn
On Tue, Aug 4, 2009 at 10:08 AM, John Aherne
wrote: This is a really basic problem we are trying to decide about,
One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.
On Wed, Aug 5, 2009 at 1:11 AM, Aaron Bush
A note on how I handled a similar situation in regards to the timeout requirements you seem to have:
I had a similar setup where I was forwarding data from clients to servers and back and forth, etc. I wanted to timeout the connection after some idle time and ended up using the TimeoutMixin found in twisted.protocols.policies. It probably does exactly what you are doing but via a simple class inheritance and variable set.
-ab
On Tue, Aug 4, 2009 at 2:35 PM, Kevin Horn
wrote: On Tue, Aug 4, 2009 at 10:08 AM, John Aherne
wrote: This is a really basic problem we are trying to decide about,
One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.
Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Aaron
Thanks for the reply. I have spent some time trying to find out how we might deal with timeouts and not found anything we can use. I'll take a look at that and see what it does. Regards John Aherne
Deferreds don't do what you think they do. They don't do anything to make your code non-blocking. They only adhere to a contract that something will eventually be returned. The most common use of deferred's to make your code non-blocking is to use .deferToThread() or some other mechanism to make the long running code non-blocking, like spawning a process.
Thanks for the reminder about deferreds.
I think the problem is more to do with knowing what role the reactor and
select perform.
I assume that sending and receiving data with sendline and linereceived are
not blocking.
So for our simple case we can ignore deferreds. They provide no benefit.
This is what our option 2 does.
John Aherne
On Tue, Aug 4, 2009 at 11:52 PM, Jarrod Roberson
Deferreds don't do what you think they do. They don't do anything to make your code non-blocking. They only adhere to a contract that something will eventually be returned. The most common use of deferred's to make your code non-blocking is to use .deferToThread() or some other mechanism to make the long running code non-blocking, like spawning a process.
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (6)
-
Aaron Bush
-
exarkun@twistedmatrix.com
-
Jarrod Roberson
-
Johann Borck
-
John Aherne
-
Kevin Horn