
Jean-Paul Calderone <exarkun@divmod.com> writes: Hi, Thanks for the answer. I'm also with the VIFF project and I would like to explain a bit more about the background for the hack by Marcel.
On Fri, 27 Feb 2009 15:26:43 +0100, Marcel Keller <mkeller@cs.au.dk> wrote:
Hi,
I am working on the VIFF project (viff.dk) which uses Twisted. I found out that our code is sometimes inefficient because we are generating many deferreds (maybe about 10000) in a callback. While doing that, no network communication is performed. Therefore, I investigated the possibility of adding a function to the reactor which is called every iteration and from which the iteration could be called safely. Then, we could generate all deferreds in that function and activate the reactor from to time. See the attached patch for details.
So you're doing a ton of work all at once now and you want to split up that ton of work into smaller pieces and do it a little at a time?
Sort of. We have overloaded the arithmetic operators in our library, so people will expect to be able to write # xs and ys are big lists of our objects dot_product for (x, y) in zip(xs, ys): dot_product += x * y Here the multiplications involves network traffic and return Deferreds. We would like the network traffic for the first multiplication to begin immediately, *before* the remaining multiplications are done. Doing all the multiplications up front makes the code block the reactor and uses an awful lot of RAM. If we let each multiplication trigger the sending of its data immediately, and if we process incoming messages along the way, memory can be reclaimed for the earlier multiplications and the above calculation should run in constant memory. Sending and processing data in a more even flow makes our benchmark results better and more consistent from one run to the next.
If that's the case, then you don't need to modify the reactor, you just need to split up the work your code is going. There are a lot of techniques for doing this. coiterate and inlineCallbacks are two solutions which are closest to "cookie cutter" (ie, you have the least flexibility in deciding how to use them).
Right -- we might be able to use these techniques. I haven't looked at coiterate yet. With inlineCallbacks I guess the code would look something like this: # xs and ys are big lists of our objects dot_product for (x, y) in zip(xs, ys): dot_product += (yield x * y) which is not so bad, expect that it destroys the nice illusion that x and y behave like normal integers even though the multiplication involves network traffic.
You have a very long, steep, uphill battle to convince me that adding support for re-entrant iteration is a good idea.
One problem I can think of is the memory usage associated with a very deep recursion. Since there is no such thing as tail call optimization in Python, each level in the recursion will hold onto any local variables even though they might not be needed any more. Are there other general problems with having a re-entrant reactor? -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.