[Twisted-Python] Question on push/pull producers inter-workings, was : "Is there a simple Producer/Consumer example or tutorial?"

Hello everyone, I am re-posting these questions with a different title, since they have ventured away from the original question I have a question though, I was looking at the way all this works by using a debugger, and I noticed that in twisted.internet.abstract.py in registerProducer() there is the following : if not streaming: producer.resumeProducing() Why is this done only for the pull producer? Shouldn't it also be called for the push producer since to have the data sent one has to call either self.transport.write() or resumeProducing() anyways? If you look at : http://itamarst.org/writings/OSCON03/twisted_internet-112.html it does : transport.registerProducer(self, 1) self.produce() thus starting the writing process and in the pull producer : http://itamarst.org/writings/OSCON03/twisted_internet-111.html it doesn't need to start the writing process explicitly since it's started when the producer is registered. Oh, and also, since from what I see in the code the only difference with a push and pull producer is that the push producer is paused if the data being written/sent is very large (to let the reactor breath and process other things), if it isn't larger than the buffer it behaves like a pull producer, correct? If so, then why have both? Am I mixed up again? Thank you, Gabriel

On Fri, 18 Apr 2008 09:57:35 +0200, Gabriel Rossetti <mailing_lists@evotex.ch> wrote:
So, this is a streaming producer. It starts producing right away without having resumeProducing called on it.
This is a non-streaming producer. It doesn't do anything until something calls resumeProducing on it.
As you say, they behave differently when there is a large amount of data. However, this is more about the source of the data than where it ends up. For example, if you have a large string and you want to producer it to a transport, you probably want a pull producer, because there's no events which will signal that you can send some more of the string *except* for the reactor deciding that it is ready for some more. So that's how you should decide which of these you want to write - if the consumer is the only event source involved, as in the large string case, then you want a pull producer (streaming = False); if the producer itself is event-driven in its ability to provide data, then you want a push producer. Jean-Paul

Jean-Paul Calderone wrote:
"there's no events which will signal that you can send some more of the string *except* for the reactor deciding that it is ready for some more"? When I looked at Twisted's code, the difference that I saw was that if a push producer is used, and if the data to be sent is bigger than a certain length, it calls producer.pauseProducing() the data, maybe I don't get what you mean by "event source". parts when the consumer is ready. Is this not correct? Gabriel

On Mon, 21 Apr 2008 09:52:56 +0200, Gabriel Rossetti <mailing_lists@evotex.ch> wrote:
This is true. Let's back up for a moment, though. A pull producer is one which only produces data when it is asked for data. The ask-for-data API is resumeProducing. This means that a consumer which is given a pull producer must ask it for data repeatedly until there is none left. The consumer is free to do this in its own pace, and a typical efficient way to do this is to ask for more data each time the application buffer is empty. A push producer produces data all the time, until it is asked to stop. It does this at whatever pace it wishes; it might produce a byte each second or it might produce a chunk of bytes each time a user interacts with a UI somehow or it might produce whatever it reads out of some socket whenever it happens to do that. The consumer is free to ask it to stop at any time though. The API for that is pauseProducing, and in this circumstance, resumeProducing delivers the opposite message: it tells the producer that it can go back to whatever it was doing. Does it make sense why only the push producer case has a pauseProducing call in it?
For example, if the consumer is a socket, then there are at least two events which it can generate which are potentially interesting: application-level buffer empty and application-level buffer full. These are good indicators that more data should be produced and that no more data should be produced (for a while), respectively.
It's often the case that a producer doesn't have all of the data it is going to produce when it is first registered with the consumer. In these cases, it is less a matter if splitting up the data and more a matter of knowing whether to keep trying to gather more data to give to the consumer. If the consumer has indicated that it wants no more data (via pauseProducing), then the producer can chill out for a while. Only when the consumer issues the resumeProducing call does the producer need to start getting data again. For TCP connections, this is a pretty good reflection of what goes on at a lower level. If you stop reading from a TCP socket, the remote side has to stop sending shortly afterwards. This is more efficient than letting an unbounded amount of data pile up in memory. If you _do_ already have all of the data that is going to be produced (that is, in-memory and as a Python string or other byte buffer object which can be used with socket.send), then the only reasons to use a producer are that some object that you want to give the data to only supports the producer/ consumer API so you have no choice but to use a producer, or that you want to know when the data has been cleared out of the application-level buffer (not necessarily sent over the network, and certainly not necessarily received by the peer, but at least no longer buffered in your userspace process). If neither of these apply, you may as well just write the one string to the transport all at once. Since you already had all the data in memory, you already payed the resource allocation penalty, so there's not really much lost by ignoring P/C. Hope this helps, Jean-Paul

Jean-Paul Calderone wrote:
Thank you Jean-Paul, yes it helps a lot. In my application, I send xml strings through a server, some may have rather large data embedded in them, so the idea for using the producer/consumer paradigm was to not congest the server as it acts like a proxy if you wish. I though that if I did that, then other clients may send data through it while the producer pauses. The server and the clients are both using server factories (see http://twistedmatrix.com/pipermail/twisted-python/2008-February/016879.html), since the client-to-client communication isn't direct, the server needs to be able to connect to the end/destination client. To send data, I use single-use clients, like described in the twisted documentation. In this case, my producer was supposed to be the single-use client and the consumer the server factory's protocol instance's (whether it being the server or the clients), transport (tcp/ip). I guess the problem is that like you said, I already have all the data in the source client and thus there is no need to use the p/c paradigm. I must ask though, when I do transfer large amounts of data, if I understood correctly the reactor is busy doing that, and thus no other clients can send data until it is done, correct? How must one correctly deal with this problem? What happens to the other clients' data that they try to send? Thank you, Gabriel

On Fri, 18 Apr 2008 09:57:35 +0200, Gabriel Rossetti <mailing_lists@evotex.ch> wrote:
So, this is a streaming producer. It starts producing right away without having resumeProducing called on it.
This is a non-streaming producer. It doesn't do anything until something calls resumeProducing on it.
As you say, they behave differently when there is a large amount of data. However, this is more about the source of the data than where it ends up. For example, if you have a large string and you want to producer it to a transport, you probably want a pull producer, because there's no events which will signal that you can send some more of the string *except* for the reactor deciding that it is ready for some more. So that's how you should decide which of these you want to write - if the consumer is the only event source involved, as in the large string case, then you want a pull producer (streaming = False); if the producer itself is event-driven in its ability to provide data, then you want a push producer. Jean-Paul

Jean-Paul Calderone wrote:
"there's no events which will signal that you can send some more of the string *except* for the reactor deciding that it is ready for some more"? When I looked at Twisted's code, the difference that I saw was that if a push producer is used, and if the data to be sent is bigger than a certain length, it calls producer.pauseProducing() the data, maybe I don't get what you mean by "event source". parts when the consumer is ready. Is this not correct? Gabriel

On Mon, 21 Apr 2008 09:52:56 +0200, Gabriel Rossetti <mailing_lists@evotex.ch> wrote:
This is true. Let's back up for a moment, though. A pull producer is one which only produces data when it is asked for data. The ask-for-data API is resumeProducing. This means that a consumer which is given a pull producer must ask it for data repeatedly until there is none left. The consumer is free to do this in its own pace, and a typical efficient way to do this is to ask for more data each time the application buffer is empty. A push producer produces data all the time, until it is asked to stop. It does this at whatever pace it wishes; it might produce a byte each second or it might produce a chunk of bytes each time a user interacts with a UI somehow or it might produce whatever it reads out of some socket whenever it happens to do that. The consumer is free to ask it to stop at any time though. The API for that is pauseProducing, and in this circumstance, resumeProducing delivers the opposite message: it tells the producer that it can go back to whatever it was doing. Does it make sense why only the push producer case has a pauseProducing call in it?
For example, if the consumer is a socket, then there are at least two events which it can generate which are potentially interesting: application-level buffer empty and application-level buffer full. These are good indicators that more data should be produced and that no more data should be produced (for a while), respectively.
It's often the case that a producer doesn't have all of the data it is going to produce when it is first registered with the consumer. In these cases, it is less a matter if splitting up the data and more a matter of knowing whether to keep trying to gather more data to give to the consumer. If the consumer has indicated that it wants no more data (via pauseProducing), then the producer can chill out for a while. Only when the consumer issues the resumeProducing call does the producer need to start getting data again. For TCP connections, this is a pretty good reflection of what goes on at a lower level. If you stop reading from a TCP socket, the remote side has to stop sending shortly afterwards. This is more efficient than letting an unbounded amount of data pile up in memory. If you _do_ already have all of the data that is going to be produced (that is, in-memory and as a Python string or other byte buffer object which can be used with socket.send), then the only reasons to use a producer are that some object that you want to give the data to only supports the producer/ consumer API so you have no choice but to use a producer, or that you want to know when the data has been cleared out of the application-level buffer (not necessarily sent over the network, and certainly not necessarily received by the peer, but at least no longer buffered in your userspace process). If neither of these apply, you may as well just write the one string to the transport all at once. Since you already had all the data in memory, you already payed the resource allocation penalty, so there's not really much lost by ignoring P/C. Hope this helps, Jean-Paul

Jean-Paul Calderone wrote:
Thank you Jean-Paul, yes it helps a lot. In my application, I send xml strings through a server, some may have rather large data embedded in them, so the idea for using the producer/consumer paradigm was to not congest the server as it acts like a proxy if you wish. I though that if I did that, then other clients may send data through it while the producer pauses. The server and the clients are both using server factories (see http://twistedmatrix.com/pipermail/twisted-python/2008-February/016879.html), since the client-to-client communication isn't direct, the server needs to be able to connect to the end/destination client. To send data, I use single-use clients, like described in the twisted documentation. In this case, my producer was supposed to be the single-use client and the consumer the server factory's protocol instance's (whether it being the server or the clients), transport (tcp/ip). I guess the problem is that like you said, I already have all the data in the source client and thus there is no need to use the p/c paradigm. I must ask though, when I do transfer large amounts of data, if I understood correctly the reactor is busy doing that, and thus no other clients can send data until it is done, correct? How must one correctly deal with this problem? What happens to the other clients' data that they try to send? Thank you, Gabriel
participants (2)
-
Gabriel Rossetti
-
Jean-Paul Calderone