Mailman 3 [Twisted-Python] Really Basic clarification on defers - Twisted

[Twisted-Python] Really Basic clarification on defers

John Aherne

4 Aug 2009 4 Aug '09

3:08 p.m.

This is a really basic problem we are trying to decide about, We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection. Traffic coming in from port B is analysed and some subset is sent back to port A. Ignoring port A for the moment, just concentrating on port B, we have tried three options:-- 1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time. No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much 2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred. We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious 3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible. The conundrum we are trying to resolve now is which option should we use. Do any of the options have a built-in problem awaiting the unwary. In theory all 3 options work. But if No 1 works well enough for our volume of traffic should we adopt that one. Or is it better to start using the defertothread option. Is there a simple answer The traffic is not large, upto a 100-200 remote devices on port B. They will send GPS data every 20 secs, and about 500 messages of about 200 bytes average throught the day. The remote devices will respond in an irregular manner without dropping the connection, so we force a disconnectf if important messages are not getting through. They are then forced to reconnect. We have looked through the code searching for enlightment and it does seem to be well documented, but the information we are looking for comes well before the doc strings. Hopefully, someone can give us some pointers in the right direction. Thanks for any help. John Aherne

Attachments:

attachment.htm (text/html — 3.3 KB)

Show replies by date

Kevin Horn

4 Aug 4 Aug

6:35 p.m.

On Tue, Aug 4, 2009 at 10:08 AM, John Aherne wrote:

...

This is a really basic problem we are trying to decide about,

We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required

One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.

Traffic coming in from port B is analysed and some subset is sent back to port A.

Ignoring port A for the moment, just concentrating on port B, we have tried three options:--

1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time. No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much

2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred. We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious

3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.

The conundrum we are trying to resolve now is which option should we use. Do any of the options have a built-in problem awaiting the unwary. In theory all 3 options work. But if No 1 works well enough for our volume of traffic should we adopt that one. Or is it better to start using the defertothread option. Is there a simple answer

The traffic is not large, upto a 100-200 remote devices on port B. They will send GPS data every 20 secs, and about 500 messages of about 200 bytes average throught the day. The remote devices will respond in an irregular manner without dropping the connection, so we force a disconnectf if important messages are not getting through. They are then forced to reconnect.

We have looked through the code searching for enlightment and it does seem to be well documented, but the information we are looking for comes well before the doc strings.

Hopefully, someone can give us some pointers in the right direction.

Thanks for any help.

John Aherne

It seems to me that the volume of traffic you are dealing with isn't so high that you need to worry too much about direct sendline causing problems. If I were writing this from scratch based on my understanding of what you've written above, I would probably go with option 2. (Keep in mind, my understanding may be flawed...so...) However, if you've already got things working with option 1, and the added complexity isn't causing you any trouble, I don't see any real reason not to use that, since you've already got that working. Others may disagree... Option 3 seems totally unnecessary to me. I typically stay away from threads in Twisted unless I have a long running non-network process to deal with (disk access, db access, heavy math processing, etc.). Especially because of the relative "heaviness" of threads when using Python (due to complex interactions with the GIL), I would avoid this method...it will probably hurt performance more than Option 1 (though still probably not enough to matter). Others feel free to slap me if I'm giving bad advice :) Kevin Horn

John Aherne

9:26 p.m.

Kevin Thanks for the reply. It's good to get some feedback on how someone else would go about tackling a particular issue. It helps to confirm whether what you a retrying makes sense. Thanks John Aherne On Tue, Aug 4, 2009 at 7:35 PM, Kevin Horn wrote:

...

On Tue, Aug 4, 2009 at 10:08 AM, John Aherne wrote:

...
This is a really basic problem we are trying to decide about,

We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required

One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.

Traffic coming in from port B is analysed and some subset is sent back to port A.

Ignoring port A for the moment, just concentrating on port B, we have tried three options:--

1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time. No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much

2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred. We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious

3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.

The conundrum we are trying to resolve now is which option should we use. Do any of the options have a built-in problem awaiting the unwary. In theory all 3 options work. But if No 1 works well enough for our volume of traffic should we adopt that one. Or is it better to start using the defertothread option. Is there a simple answer

The traffic is not large, upto a 100-200 remote devices on port B. They will send GPS data every 20 secs, and about 500 messages of about 200 bytes average throught the day. The remote devices will respond in an irregular manner without dropping the connection, so we force a disconnectf if important messages are not getting through. They are then forced to reconnect.

We have looked through the code searching for enlightment and it does seem to be well documented, but the information we are looking for comes well before the doc strings.

Hopefully, someone can give us some pointers in the right direction.

Thanks for any help.

John Aherne

It seems to me that the volume of traffic you are dealing with isn't so high that you need to worry too much about direct sendline causing problems. If I were writing this from scratch based on my understanding of what you've written above, I would probably go with option 2. (Keep in mind, my understanding may be flawed...so...) However, if you've already got things working with option 1, and the added complexity isn't causing you any trouble, I don't see any real reason not to use that, since you've already got that working. Others may disagree...

Option 3 seems totally unnecessary to me. I typically stay away from threads in Twisted unless I have a long running non-network process to deal with (disk access, db access, heavy math processing, etc.). Especially because of the relative "heaviness" of threads when using Python (due to complex interactions with the GIL), I would avoid this method...it will probably hurt performance more than Option 1 (though still probably not enough to matter).

Others feel free to slap me if I'm giving bad advice :)

Kevin Horn

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Johann Borck

11:14 p.m.

On Tue, Aug 4, 2009 at 10:08 AM, John Aherne mailto:johnaherne@rocs.co.uk> wrote:

...

This is a really basic problem we are trying to decide about,

We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required

One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.

Traffic coming in from port B is analysed and some subset is sent back to port A.

Ignoring port A for the moment, just concentrating on port B, we have tried three options:--

1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time.

The reactor always schedules reads and writes "in its own good time", which means it writes whenever there's data to write and the socket is ready for writing. If you have data that can't be written at once, because it's too much for the socket to handle in a non-blocking fashion, the reactor (along with the transport) will take care of it, and defer its delivery itself, no need for any deferreds you'd had to care about here. Correct me if I'm wrong, but as I understand your description, option 1. and 2. do not behave identically. This is how I interpret it: option 1: A sends msg1 to [svc] : wrap msg1 in deferred1 [ - time - ] B sends data? to [svc] : 1. callback deferred1: [svc] sends msg1 to B 2. handle data? B sends rsp1 to [svc]: [svc] sends rsp1 to A option 2: A sends msg1 to [svc] : [svc] sends msg1 to B B sends rsp1 to [svc] : [svc] sends rsp1 to A If this is the case, you rely on some data? being sent to [svc] before msg1 can be forwarded to B. That means that you have msg1 in memory until you receive data? from B. This doesn't cause problems in your case, since you handle small messages in big intervals. But if you'd increase the load significantly, you'd also need significantly more RAM for no good reason. A case where option 1 might make sense would be if it depended on data? provided by B, to decide if or how to continue processing msg1. Then you had a valid use-case for deferreds. Since there are no such requirements, option 2 is definitely the right choice.

...

No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much

2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred.

There's absolutely no penalty (unless you allow the notion of negative penalties). Using sendline directly is faster than using a deferred in between, even if you don't count the memory overhead. I think there's a bit confusion about the role of deferreds in twisted here. Deferreds don't help you (or the reactor) with scheduling, they only provide you with a means to continue some processing after a certain event occurred.

...

We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious

3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.

Do you use sendline (the twisted api) from within the thread? If yes and it works, it works accidentally, probably also due to the very small load, and is definitely wrong (as well as unnecessary), twisted is not threadsafe, with the exception of a few methods/functions like callInThread/callFromThread/defertoThread etc. hope that helps, Johann

John Aherne

5 Aug 5 Aug

9:33 a.m.

On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck wrote:

...

On Tue, Aug 4, 2009 at 10:08 AM, John Aherne mailto:johnaherne@rocs.co.uk> wrote:

...
This is a really basic problem we are trying to decide about,

We have programs that run quite happily, so far. Its main task is to receive data from port A and send it out via port B. Then receive data via port B and send it out via port A. It's pretty much like a chat setup. You just build up a list of connected clients and send data to them as required

One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.

Traffic coming in from port B is analysed and some subset is sent back to port A.

Ignoring port A for the moment, just concentrating on port B, we have tried three options:--

1. We set up a defer to handle the sendline to port B so that the reactor would schedule it in its own good time.

The reactor always schedules reads and writes "in its own good time", which means it writes whenever there's data to write and the socket is ready for writing. If you have data that can't be written at once, because it's too much for the socket to handle in a non-blocking fashion, the reactor (along with the transport) will take care of it, and defer its delivery itself, no need for any deferreds you'd had to care about here.

Correct me if I'm wrong, but as I understand your description, option 1. and 2. do not behave identically. This is how I interpret it: option 1:

A sends msg1 to [svc] : wrap msg1 in deferred1 [ - time - ] B sends data? to [svc] : 1. callback deferred1: [svc] sends msg1 to B 2. handle data? B sends rsp1 to [svc]: [svc] sends rsp1 to A

option 2:

A sends msg1 to [svc] : [svc] sends msg1 to B B sends rsp1 to [svc] : [svc] sends rsp1 to A

If this is the case, you rely on some data? being sent to [svc] before msg1 can be forwarded to B. That means that you have msg1 in memory until you receive data? from B. This doesn't cause problems in your case, since you handle small messages in big intervals. But if you'd increase the load significantly, you'd also need significantly more RAM for no good reason. A case where option 1 might make sense would be if it depended on data? provided by B, to decide if or how to continue processing msg1. Then you had a valid use-case for deferreds. Since there are no such requirements, option 2 is definitely the right choice.

...
No threads involved using the standard twisted setup. When we get a response through receiveline we fire the callback defer. If we timeout via callLater we fire the errback to clear the defer. In this case the defer does not seem to be doing very much

2. Now a fresh pair of eyes is looking at the code and saying why are we using a deferred for sending data to port B. We could just issue a straight sendline as part of the main code and carry on. If we get a response via linereceiver,we process it normally, otherwise we set our callLater running and timeout and lose the connection. So no deferreds required at all. It does seem to work.What we are not sure about is what penalty is incurred in terms of reliability or throughput by using sendline without a deferred.

There's absolutely no penalty (unless you allow the notion of negative penalties). Using sendline directly is faster than using a deferred in between, even if you don't count the memory overhead. I think there's a bit confusion about the role of deferreds in twisted here. Deferreds don't help you (or the reactor) with scheduling, they only provide you with a means to continue some processing after a certain event occurred.

...
We are not too sure what the holdup will be and whether it could end up halting the show. Is it better to schedule these messages via deferreds or am I missing something obvious

3. So we then did an experiment and used defertothread to run the sendline in a separate thread with its own defer to maximise the asynchronous running of the code. So now we are running threads when one of the reasons for looking at twisted was that we could avoid threads as much as possible.

Do you use sendline (the twisted api) from within the thread? If yes and it works, it works accidentally, probably also due to the very small load, and is definitely wrong (as well as unnecessary), twisted is not threadsafe, with the exception of a few methods/functions like callInThread/callFromThread/defertoThread etc.

hope that helps, Johann

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Johann Thanks for a very informative reply. Our problem was more basic than deferreds. It was to with the reactor and select command. Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly. Regarding messages being stored in memory, we only store the sequence number for each message so we can identify the message being responded to. We can then report back to A with the original message sequence number. So the memory footprint is quite small. Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that. Once again thanks for a very good response. That has cleared up a lot of confusion. I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said Regards John Aherne

exarkun＠twistedmatrix.com

3:17 p.m.

On 09:33 am, johnaherne@rocs.co.uk wrote:

...

On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck wrote:

...
[snip]

Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly.

LineReceiver.sendLine is not blocking, correct. However, your statement implies that if it were blocking, you could use Deferreds to address this problem. This is incorrect. Deferreds do not make blocking APIs into non-blocking APIs.

...

[snip]

Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that.

deferToThread is not thread-safe: you may only call it from the reactor thread (the thread in which you called reactor.run). Since deferToThread runs the function you pass to it in a non-reactor thread, you may not use any non-thread-safe Twisted APIs in the function you pass to it.

...

Once again thanks for a very good response. That has cleared up a lot of confusion.

I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said

A significant effort is presently underway to improve the documentation about Deferreds. Any specific feedback you have about it would be much appreciated. :) Jean-Paul

John Aherne

10:04 p.m.

On Wed, Aug 5, 2009 at 4:17 PM, wrote:

...

On 09:33 am, johnaherne@rocs.co.uk wrote:

...
On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck wrote:

...
[snip]

Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly.

LineReceiver.sendLine is not blocking, correct. However, your statement implies that if it were blocking, you could use Deferreds to address this problem. This is incorrect. Deferreds do not make blocking APIs into non-blocking APIs.

...
[snip]

Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that.

deferToThread is not thread-safe: you may only call it from the reactor thread (the thread in which you called reactor.run).

Since deferToThread runs the function you pass to it in a non-reactor thread, you may not use any non-thread-safe Twisted APIs in the function you pass to it.

...
Once again thanks for a very good response. That has cleared up a lot of confusion.

I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said

A significant effort is presently underway to improve the documentation about Deferreds. Any specific feedback you have about it would be much appreciated. :)

Jean-Paul

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Jean-Paul Thanks for the clarifications. I assume what I meant by blocking was I would have to put a function into deferToThread and add some callbacks to return the result. And not directly calling sendline from the thread. This is what Jarrod suggested in his reply. If I'm wrong, please say so. I have read the thread regarding deferreds with interest, but did not feel I knew enough to contribute. I do feel qualified to ask some very daft questions. Unfortunately, I don't see too many daft questions being asked in the list. I reckon if I need to know the answer to them then some other people probably do as well but don't put themselves forward. I see it as a way to document information that is difficult to come by. And I really do appreciate the very good answers I get However, after this little foray, I probably feel able to comment:-- The concept of deferreds is very simple. Everyone understands the concept - even I do. The issue is how and why and where you should use them in twisted. Some Basic getting Started Points. 1. For simple network activity do not use deferreds. They are not necessary. You can get a lot done without deferreds. And you don't know how to use them yet. The reactor and the select will process the outgoing and incoming buffers without blocking. Anyone familiar with networking and select will already understand this. Anyone not familiar will not realise it and needs to be made aware of how the select works. 2. If you have blocking code - please define blocking :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response. 3. John Goerzen in his Apress book Python Network Fundamentals has a very simple chat server example. With a few comments for the uninitiated, this would be a good starter. Possibly I could ask for permission to include it in some twisted HOWTO documentation for beginners with suitable copyright recognition. 4. With these few points as starters, maybe more people will be encouraged to get started with twisted. And if you know you can ignore deferreds until later you will find twisted is very simple to use and get some good results with little effort. 5. The emphasis on how deferreds work probably ought to be counterbalanced by some insight into how and why and where you would use them. For example, if you have a text file of 10000 lines you need to read in and summarize, presumably you would run this with deferToThread(+other options) and get the result via the callback. If someone has a better example please let me know. 6. Blocking code is always put into a thread or like, and a deferred callback or errback used to return the result or failure of the blocking code from the thread. See jarrod's response above You asked :) I'm giving I may be completely off track in what I have said above. And I would not want anyone to fall upon this mail and think it represents the gospel truth. Thank you for the response.. John Aherne

Jarrod Roberson

6 Aug 6 Aug

12:12 a.m.

On Wed, Aug 5, 2009 at 6:04 PM, John Aherne wrote:

...

2. If you have blocking code - *please define blocking* :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.

blocking code is code that will block or may potentially block the continued execution of the main reactor thread. think for the most part long running processes or operations that may be long running. doing file or network i/o, calculating cpu intensive work, operations that may timeout like doing a remote call to another process or host machine, database operations are usually a culprit, that may be flooded with work or crashed, the examples go on but are mainly about i/o and cpu intensive operations. when these things happen on the reactor / main thread they block the server from doing anything else, it can't accept new connections, it can't do anything else until this blocking activity has completed and returned control to the reactor thread. you can handle this without deferToThread by breaking the blocking code up into smaller pieces sometimes. need to transfer a large file to a socket, instead of trying to send it all at once send 10KB at a time and yield back to the reactor and reschedule the next 10KB until finished, this will work, it might not be the fastest way and still may block for an unacceptable amount of time on just 10KB, depending on how heavily taxed the i/o system is at the moment. Usually deferToThread is just easier to implement.

John Aherne

7:02 a.m.

On Thu, Aug 6, 2009 at 1:12 AM, Jarrod Roberson wrote:

...

On Wed, Aug 5, 2009 at 6:04 PM, John Aherne wrote:

...
2. If you have blocking code - *please define blocking* :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.

blocking code is code that will block or may potentially block the continued execution of the main reactor thread. think for the most part long running processes or operations that may be long running. doing file or network i/o, calculating cpu intensive work, operations that may timeout like doing a remote call to another process or host machine, database operations are usually a culprit, that may be flooded with work or crashed, the examples go on but are mainly about i/o and cpu intensive operations. when these things happen on the reactor / main thread they block the server from doing anything else, it can't accept new connections, it can't do anything else until this blocking activity has completed and returned control to the reactor thread.

you can handle this without deferToThread by breaking the blocking code up into smaller pieces sometimes. need to transfer a large file to a socket, instead of trying to send it all at once send 10KB at a time and yield back to the reactor and reschedule the next 10KB until finished, this will work, it might not be the fastest way and still may block for an unacceptable amount of time on just 10KB, depending on how heavily taxed the i/o system is at the moment. Usually deferToThread is just easier to implement.

Jarrod,

Thanks. I've incorporated some of what you said into a reply to my own mail. A point I should have added to my other ones first time round. John Aherne

...

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

John Aherne

10:29 a.m.

I'll try and put all my comments together with the feedback from everyone. Then I can pass it over to the defer documentation thread to see if they are interested in any part of it. John On Thu, Aug 6, 2009 at 8:02 AM, John Aherne wrote:

...

On Thu, Aug 6, 2009 at 1:12 AM, Jarrod Roberson wrote:

...
On Wed, Aug 5, 2009 at 6:04 PM, John Aherne wrote:

...
2. If you have blocking code - *please define blocking* :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.

blocking code is code that will block or may potentially block the continued execution of the main reactor thread. think for the most part long running processes or operations that may be long running. doing file or network i/o, calculating cpu intensive work, operations that may timeout like doing a remote call to another process or host machine, database operations are usually a culprit, that may be flooded with work or crashed, the examples go on but are mainly about i/o and cpu intensive operations. when these things happen on the reactor / main thread they block the server from doing anything else, it can't accept new connections, it can't do anything else until this blocking activity has completed and returned control to the reactor thread.

you can handle this without deferToThread by breaking the blocking code up into smaller pieces sometimes. need to transfer a large file to a socket, instead of trying to send it all at once send 10KB at a time and yield back to the reactor and reschedule the next 10KB until finished, this will work, it might not be the fastest way and still may block for an unacceptable amount of time on just 10KB, depending on how heavily taxed the i/o system is at the moment. Usually deferToThread is just easier to implement.

Jarrod,

Thanks. I've incorporated some of what you said into a reply to my own mail. A point I should have added to my other ones first time round.

John Aherne

...
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

John Aherne

6:59 a.m.

A point I missed out on: The adbapi module seems to be a good example of using deferreds and threads. The adbapi module returns a deferred it has created, you add your callbacks to it. It calls your callback when ready. It does seem like the examplar for doing deferreds. The db stuff will normally block so put it in a thread and use deferreds to get the result or failure. A point about the db calls is that they can be very intensive. If you need to run some db calls every 30 secs or 60 secs and the db takes 50% or more of the time to generate the results, you won't have much time to service any incoming requests to see the data results. The remote connections will be failing bigtime. So then I suppose you should break the code into 2 programs. One that does the db stuff, the other to handle the remote connections. The db code when it has a result will then connect to the other program and pass across its results. There may be better ways of doing this of course. So as Jarrod points out, deferToThread is an easy way of solving blocking code, but not always. It seems good for short blocks, but you do need the bulk of the time available for handling connections. If anyone sees howlers in this, please let me know. Regards John Aherne On Wed, Aug 5, 2009 at 11:04 PM, John Aherne wrote:

...

On Wed, Aug 5, 2009 at 4:17 PM, wrote:

...
On 09:33 am, johnaherne@rocs.co.uk wrote:

...
On Wed, Aug 5, 2009 at 12:14 AM, Johann Borck wrote:

...
[snip]

Sendline is not blocking so as you say we can avoid the use of deferreds and continue to use sendline directly.

LineReceiver.sendLine is not blocking, correct. However, your statement implies that if it were blocking, you could use Deferreds to address this problem. This is incorrect. Deferreds do not make blocking APIs into non-blocking APIs.

...
[snip]

Our option 3 using defertothread does use sendline from the thread. Your response implies that is OK since you say defertothread is threadsafe. Did you really mean that.

deferToThread is not thread-safe: you may only call it from the reactor thread (the thread in which you called reactor.run).

Since deferToThread runs the function you pass to it in a non-reactor thread, you may not use any non-thread-safe Twisted APIs in the function you pass to it.

...
Once again thanks for a very good response. That has cleared up a lot of confusion.

I suppose it would help if there was a paragraph at the start of the twisted documentation detailing what you have just said. So when they start on deferreds you have some sort of context in which to interpret what is being said

A significant effort is presently underway to improve the documentation about Deferreds. Any specific feedback you have about it would be much appreciated. :)

Jean-Paul

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Jean-Paul

Thanks for the clarifications.

I assume what I meant by blocking was I would have to put a function into deferToThread and add some callbacks to return the result. And not directly calling sendline from the thread. This is what Jarrod suggested in his reply. If I'm wrong, please say so.

I have read the thread regarding deferreds with interest, but did not feel I knew enough to contribute.

I do feel qualified to ask some very daft questions. Unfortunately, I don't see too many daft questions being asked in the list. I reckon if I need to know the answer to them then some other people probably do as well but don't put themselves forward. I see it as a way to document information that is difficult to come by. And I really do appreciate the very good answers I get

However, after this little foray, I probably feel able to comment:--

The concept of deferreds is very simple. Everyone understands the concept - even I do. The issue is how and why and where you should use them in twisted.

Some Basic getting Started Points.

1. For simple network activity do not use deferreds. They are not necessary. You can get a lot done without deferreds. And you don't know how to use them yet. The reactor and the select will process the outgoing and incoming buffers without blocking. Anyone familiar with networking and select will already understand this. Anyone not familiar will not realise it and needs to be made aware of how the select works.

2. If you have blocking code - please define blocking :), then first think about putting it into deferToThread with appropriate callbacks and return the deferred. As suggested by Jarrod in his response.

3. John Goerzen in his Apress book Python Network Fundamentals has a very simple chat server example. With a few comments for the uninitiated, this would be a good starter. Possibly I could ask for permission to include it in some twisted HOWTO documentation for beginners with suitable copyright recognition.

4. With these few points as starters, maybe more people will be encouraged to get started with twisted. And if you know you can ignore deferreds until later you will find twisted is very simple to use and get some good results with little effort.

5. The emphasis on how deferreds work probably ought to be counterbalanced by some insight into how and why and where you would use them. For example, if you have a text file of 10000 lines you need to read in and summarize, presumably you would run this with deferToThread(+other options) and get the result via the callback. If someone has a better example please let me know.

6. Blocking code is always put into a thread or like, and a deferred callback or errback used to return the result or failure of the blocking code from the thread. See jarrod's response above

You asked :) I'm giving

I may be completely off track in what I have said above. And I would not want anyone to fall upon this mail and think it represents the gospel truth.

Thank you for the response..

John Aherne

Aaron Bush

5 Aug 5 Aug

12:11 a.m.

A note on how I handled a similar situation in regards to the timeout requirements you seem to have: I had a similar setup where I was forwarding data from clients to servers and back and forth, etc. I wanted to timeout the connection after some idle time and ended up using the TimeoutMixin found in twisted.protocols.policies. It probably does exactly what you are doing but via a simple class inheritance and variable set. -ab On Tue, Aug 4, 2009 at 2:35 PM, Kevin Horn wrote:

...

On Tue, Aug 4, 2009 at 10:08 AM, John Aherne wrote:

...
This is a really basic problem we are trying to decide about,

One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.

John Aherne

9:35 a.m.

On Wed, Aug 5, 2009 at 1:11 AM, Aaron Bush wrote:

...

A note on how I handled a similar situation in regards to the timeout requirements you seem to have:

I had a similar setup where I was forwarding data from clients to servers and back and forth, etc. I wanted to timeout the connection after some idle time and ended up using the TimeoutMixin found in twisted.protocols.policies. It probably does exactly what you are doing but via a simple class inheritance and variable set.

-ab

On Tue, Aug 4, 2009 at 2:35 PM, Kevin Horn wrote:

...
On Tue, Aug 4, 2009 at 10:08 AM, John Aherne wrote:

...
This is a really basic problem we are trying to decide about,

One side A receives some input from a tcp port - about 100-200 characters, and forwards it to another port B. We do not need to wait for any response. If we get a response we pick that up through line receiver. We also run a calllater to check if we got a response on linereceiver within the timeframe specified. If not we drop the connection.

Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Aaron

Thanks for the reply. I have spent some time trying to find out how we might deal with timeouts and not found anything we can use. I'll take a look at that and see what it does. Regards John Aherne

Jarrod Roberson

4 Aug 4 Aug

10:52 p.m.

Deferreds don't do what you think they do. They don't do anything to make your code non-blocking. They only adhere to a contract that something will eventually be returned. The most common use of deferred's to make your code non-blocking is to use .deferToThread() or some other mechanism to make the long running code non-blocking, like spawning a process.

John Aherne

5 Aug 5 Aug

9:17 a.m.

Thanks for the reminder about deferreds. I think the problem is more to do with knowing what role the reactor and select perform. I assume that sending and receiving data with sendline and linereceived are not blocking. So for our simple case we can ignore deferreds. They provide no benefit. This is what our option 2 does. John Aherne On Tue, Aug 4, 2009 at 11:52 PM, Jarrod Roberson wrote:

...

Deferreds don't do what you think they do. They don't do anything to make your code non-blocking. They only adhere to a contract that something will eventually be returned. The most common use of deferred's to make your code non-blocking is to use .deferToThread() or some other mechanism to make the long running code non-blocking, like spawning a process.

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

5370

Age (days ago)

5372

Last active (days ago)

List overview

Download

14 comments

6 participants

participants (6)

Aaron Bush
exarkun＠twistedmatrix.com
Jarrod Roberson
Johann Borck
John Aherne
Kevin Horn

[Twisted-Python] Really Basic clarification on defers

Jarrod Roberson

Aaron Bush

Jarrod Roberson

tags

participants (6)