[Twisted-Python] Re-entrancy policy for the reactor

I'm going to merge #5063 next time I have a few minutes when I'm more awake, which will mean my yak stack[1] is empty and I can go back to working on abortConnection(). As explained in http://twistedmatrix.com/trac/ticket/78, abortConnection() is like loseConnection(), except it doesn't wait until buffers are written out, it closes the connection immediately. The question is, how immediately?
My current implementation, a bit like one of the half-close code paths, ends up calling connectionLost directly. That means you can have a call stack that looks like this:
MyProtocol.dataReceived TCPConnection.abortConnection ... MyProtocol.connectionLost
This can lead to re-entrancy bugs. I am tempted to suggest a policy of No Reactor Reentrancy, but that is currently violated by one of the TCP half-close code paths, and producers if I'm not mistaken.
Some options:
(a) Leave abortConnection() reentrant. (b) Make abortConnection() non-reentrant, don't change any current APIs. (c) Make all reactor APIs non-reentrant.
(Since there's a ticket for documenting the reentrancy policy, you will notice I've added another yak to my stack. I'll finish #78 someday!)
Thoughts?
References: [1] https://secure.wikimedia.org/wiktionary/en/wiki/yak_shaving

On Sun, 2011-07-24 at 22:51 -0400, Itamar Turner-Trauring wrote:
This can lead to re-entrancy bugs. I am tempted to suggest a policy of No Reactor Reentrancy, but that is currently violated by one of the TCP half-close code paths, and producers if I'm not mistaken.
Instead of "No Reactor Re-entrancy", I meant "No Protocol Re-entrancy", i.e. the reactor only ever calls into a protocol once in a given call stack.

On Jul 24, 2011, at 10:51 PM, Itamar Turner-Trauring wrote:
I'm going to merge #5063 next time I have a few minutes when I'm more awake, which will mean my yak stack[1] is empty and I can go back to working on abortConnection(). As explained in http://twistedmatrix.com/trac/ticket/78, abortConnection() is like loseConnection(), except it doesn't wait until buffers are written out, it closes the connection immediately. The question is, how immediately?
My current implementation, a bit like one of the half-close code paths, ends up calling connectionLost directly. That means you can have a call stack that looks like this:
MyProtocol.dataReceived TCPConnection.abortConnection ... MyProtocol.connectionLost
This can lead to re-entrancy bugs. I am tempted to suggest a policy of No Reactor Reentrancy, but that is currently violated by one of the TCP half-close code paths, and producers if I'm not mistaken.
Some options:
(a) Leave abortConnection() reentrant. (b) Make abortConnection() non-reentrant, don't change any current APIs. (c) Make all reactor APIs non-reentrant.
(Since there's a ticket for documenting the reentrancy policy, you will notice I've added another yak to my stack. I'll finish #78 someday!)
Thoughts?
My main thought here is that protocol reentrancy is bad, and nobody really expects it even if they think it should be fine.
However, I do believe it would be best (easier to test, in particular) to immediately call connectionLost within doRead (or doWrite) after dataReceived exits, rather than callLater(0)-ing it or otherwise placing it into a global call queue.
-glyph

On 09:10 pm, glyph@twistedmatrix.com wrote:
On Jul 24, 2011, at 10:51 PM, Itamar Turner-Trauring wrote:
I'm going to merge #5063 next time I have a few minutes when I'm more awake, which will mean my yak stack[1] is empty and I can go back to working on abortConnection(). As explained in http://twistedmatrix.com/trac/ticket/78, abortConnection() is like loseConnection(), except it doesn't wait until buffers are written out, it closes the connection immediately. The question is, how immediately?
My current implementation, a bit like one of the half-close code paths, ends up calling connectionLost directly. That means you can have a call stack that looks like this:
MyProtocol.dataReceived TCPConnection.abortConnection ... MyProtocol.connectionLost
This can lead to re-entrancy bugs. I am tempted to suggest a policy of No Reactor Reentrancy, but that is currently violated by one of the TCP half-close code paths, and producers if I'm not mistaken.
Some options:
(a) Leave abortConnection() reentrant. (b) Make abortConnection() non-reentrant, don't change any current APIs. (c) Make all reactor APIs non-reentrant.
(Since there's a ticket for documenting the reentrancy policy, you will notice I've added another yak to my stack. I'll finish #78 someday!)
Thoughts?
My main thought here is that protocol reentrancy is bad, and nobody really expects it even if they think it should be fine.
However, I do believe it would be best (easier to test, in particular) to immediately call connectionLost within doRead (or doWrite) after dataReceived exits, rather than callLater(0)-ing it or otherwise placing it into a global call queue.
Probably not easier to test, nor easier to get right. A single generalized solution to avoid re-entrancy is probably the only way to avoid a perpetual sequence of accidental re-entrant calls in obscure untested cases in each different reactor implementation.
Perhaps the solution doesn't need to be `reactor.callLater(0, ...)`, but I think various implementations will be better if they put tasks into some kind of queue, perhaps checked at the end of the iteration instead of in the next iteration, rather than special casing each possible re- entrant event in each possible event dispatcher.
Jean-Paul

On Jul 27, 2011, at 5:41 PM, exarkun@twistedmatrix.com wrote:
My main thought here is that protocol reentrancy is bad, and nobody really expects it even if they think it should be fine.
However, I do believe it would be best (easier to test, in particular) to immediately call connectionLost within doRead (or doWrite) after dataReceived exits, rather than callLater(0)-ing it or otherwise placing it into a global call queue.
Probably not easier to test, nor easier to get right. A single generalized solution to avoid re-entrancy is probably the only way to avoid a perpetual sequence of accidental re-entrant calls in obscure untested cases in each different reactor implementation.
Fair point; upon reflection of all the previous ad-hoc non-reentrancy stuff I've seen in the reactor, I retract my suggestion. For testing, it's easy enough to plug a Clock() in there and iterate it once.
Perhaps the solution doesn't need to be `reactor.callLater(0, ...)`, but I think various implementations will be better if they put tasks into some kind of queue, perhaps checked at the end of the iteration instead of in the next iteration, rather than special casing each possible re- entrant event in each possible event dispatcher.
What would be the point of adding another task queue to the reactor? This sounds like a solution to a non-problem. (Although I might have suggested the same thing myself if I hadn't thought about it too much first.)
The answer that comes to mind when I think about this question is that an end-of-iteration queue would avoid unnecessary additional invocations of whatever multiplexing system call the reactor needs to make. But there are several points on the down side:
it's more code to maintain it's more interfaces for users, or at least maintainers, to learn it's an optimization, and so we should produce a benchmark indicating that the optimization makes sense and is effective before doing it the optimization could be applied to callLater as it stands; any callLater(0) calls could be executed before the next reactor tick
Mostly though, an additional queue for non-reentrancy would be an ad-hoc special-case solution to one tiny part of the more general problem that reactor event ordering is a complete accident with unforseen consequences. If there were a more general scheduling mechanism, we could schedule events in a better order overall.
-glyph

On Wed, 2011-07-27 at 18:10 -0400, Glyph Lefkowitz wrote:
Mostly though, an additional queue for non-reentrancy would be an ad-hoc special-case solution to one tiny part of the more general problem that reactor event ordering is a complete accident with unforseen consequences. If there were a more general scheduling mechanism, we could schedule events in a better order overall.
We already have a queue, for reactor.callFromThread. We could add, say, reactor.runInABit, and make reactor.callFromThread that just adds a waker notification after calling that.
participants (3)
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Itamar Turner-Trauring