
On Sat, 07 Mar 2009 19:38:46 +0100, Martin Geisler <mg@daimi.au.dk> wrote:
Jean-Paul Calderone <exarkun@divmod.com> writes:
Hi,
Thanks for the answer. I'm also with the VIFF project and I would like to explain a bit more about the background for the hack by Marcel.
[snip]
So you're doing a ton of work all at once now and you want to split up that ton of work into smaller pieces and do it a little at a time?
Sort of. We have overloaded the arithmetic operators in our library, so people will expect to be able to write
# xs and ys are big lists of our objects dot_product for (x, y) in zip(xs, ys): dot_product += x * y
Here the multiplications involves network traffic and return Deferreds. We would like the network traffic for the first multiplication to begin immediately, *before* the remaining multiplications are done.
Doing all the multiplications up front makes the code block the reactor and uses an awful lot of RAM. If we let each multiplication trigger the sending of its data immediately, and if we process incoming messages along the way, memory can be reclaimed for the earlier multiplications and the above calculation should run in constant memory.
Hm. I would bet the constant would be pretty high, though. Things will only balance out once the network is keeping up with the local for loop. Actually, I'm not sure why it would be constant at all. Won't the local for loop always run much faster than any network operations can happen? In this case, memory usage will go towards K(local) * set size - K(remote) * set size, or just O(set size); that is to say, linear.
Sending and processing data in a more even flow makes our benchmark results better and more consistent from one run to the next.
It seems like what you'll really benefit from most is pipelining with a maximum pipeline depth that's not too big (so as to conserve memory) but not too small (so as to avoid wasting time on network round trip time).
If that's the case, then you don't need to modify the reactor, you just need to split up the work your code is going. There are a lot of techniques for doing this. coiterate and inlineCallbacks are two solutions which are closest to "cookie cutter" (ie, you have the least flexibility in deciding how to use them).
Right -- we might be able to use these techniques. I haven't looked at coiterate yet. With inlineCallbacks I guess the code would look something like this:
# xs and ys are big lists of our objects dot_product for (x, y) in zip(xs, ys): dot_product += (yield x * y)
which is not so bad, expect that it destroys the nice illusion that x and y behave like normal integers even though the multiplication involves network traffic.
Perhaps with coiterate this might look something like... def op(xs, ys): dot_product = 0 for (x, y) in zip(xs, ys): dot_product += x * y yield yield dot_product d = coiterate(op(xs, ys)) This is flawed, but maybe it can be fixed. What's good about it is that it doesn't try to drive the reactor re-entrantly. :) It also avoids the yield in the += and * operations, which somewhat preserves your illusion of normalcy (I'm not saying I like that illusion, but I'll leave that aside for now). What's bad about it is that it still lets the local loop run arbitrarily far ahead of the results from the network, giving you unbounded memory usage. To fix that, it should yield a Deferred every once in a while. The reason I leave it flawed here is that it's a little tricky to figure out which Deferred to yield. What would be great would be to yield the Deferred for an operation which preceeds the most recently executed operation by some number of operations. The number of operations by which it preceeds the most recent will be your pipeline depth (roughly). The effect this has on coiterate is to cause your local loop to cease to execute until that Deferred has fired. By making it a Deferred for an operation some time /before/ the most recent operation, you keep the network busy and avoid wasting time on round trip times. Once the Deferred fires, your loop gets run a few more times which has the effect of putting more work into the pipeline - until you've got enough extra work to keep things from stalling again, and then coiterate puts you to sleep for a while again.
You have a very long, steep, uphill battle to convince me that adding support for re-entrant iteration is a good idea.
One problem I can think of is the memory usage associated with a very deep recursion. Since there is no such thing as tail call optimization in Python, each level in the recursion will hold onto any local variables even though they might not be needed any more.
Are there other general problems with having a re-entrant reactor?
One general problem is simply that the reactor is not written with this in mind. So with the current implementation, even including the patch Marcel contributed, it doesn't work, period. When I say "doesn't work", I mean that there are cases which will simply result in incorrect behavior, though there may be other cases where everything does work out as you expect. Going along with this, it's quite a bit harder to test that things work when re-entrancy is allowed or expected than if it isn't, so even if all of the current reactor implementations were made re-entrant, there would likely be a higher rate of defects related to this in the future, simply because it's harder. This isn't limited solely to the reactor, either. A re-entrant reactor almost certainly means that application code will be invoked re-entrantly. This is actually the case already, unfortunately, and it is a bit of an open question as to what should be done about it. A very common bug I find (and write! :() in Twisted applications is failure to handle various forms of re-entrancy correctly. This is an existing issue with Twisted, not related to this proposed change, but making this change would only make the problem worse and essentially destroy and hope for ever making it better. Actually, that general problem is so general that I can't really think of any others to discuss, so I'll leave it at that for now. :) If you want, I can probably share some specific examples of how re-entrancy leads to surprising bugs in unexpected places (probably from real programs, too :/). Jean-Paul