[Twisted-Python] Re: Reentrant reactor iteration

March 7, 2009

      Jean-Paul Calderone <exarkun@divmod.com> writes:

Hi,

Thanks for the answer. I'm also with the VIFF project and I would like
to explain a bit more about the background for the hack by Marcel.
...
On Fri, 27 Feb 2009 15:26:43 +0100, Marcel Keller <mkeller@cs.au.dk> wrote:
...
Hi,
I am working on the VIFF project (viff.dk) which uses Twisted. I
found out that our code is sometimes inefficient because we are
generating many deferreds (maybe about 10000) in a callback. While
doing that, no network communication is performed. Therefore, I
investigated the possibility of adding a function to the reactor
which is called every iteration and from which the iteration could
be called safely. Then, we could generate all deferreds in that
function and activate the reactor from to time. See the attached
patch for details.
So you're doing a ton of work all at once now and you want to split up
that ton of work into smaller pieces and do it a little at a time?
Sort of. We have overloaded the arithmetic operators in our library, so
people will expect to be able to write

  # xs and ys are big lists of our objects
  dot_product
  for (x, y) in zip(xs, ys):
    dot_product += x * y

Here the multiplications involves network traffic and return Deferreds.
We would like the network traffic for the first multiplication to begin
immediately, *before* the remaining multiplications are done.

Doing all the multiplications up front makes the code block the reactor
and uses an awful lot of RAM. If we let each multiplication trigger the
sending of its data immediately, and if we process incoming messages
along the way, memory can be reclaimed for the earlier multiplications
and the above calculation should run in constant memory.

Sending and processing data in a more even flow makes our benchmark
results better and more consistent from one run to the next.
...
If that's the case, then you don't need to modify the reactor, you
just need to split up the work your code is going. There are a lot of
techniques for doing this. coiterate and inlineCallbacks are two
solutions which are closest to "cookie cutter" (ie, you have the least
flexibility in deciding how to use them).
Right -- we might be able to use these techniques. I haven't looked at
coiterate yet. With inlineCallbacks I guess the code would look
something like this:

  # xs and ys are big lists of our objects
  dot_product
  for (x, y) in zip(xs, ys):
    dot_product += (yield x * y)

which is not so bad, expect that it destroys the nice illusion that x
and y behave like normal integers even though the multiplication
involves network traffic.
...
You have a very long, steep, uphill battle to convince me that adding
support for re-entrant iteration is a good idea.
One problem I can think of is the memory usage associated with a very
deep recursion. Since there is no such thing as tail call optimization
in Python, each level in the recursion will hold onto any local
variables even though they might not be needed any more.

Are there other general problems with having a re-entrant reactor?

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.