Re: [Twisted-Python] A pseudo-deferred class that can be canceled
"Glyph" == Glyph Lefkowitz <glyph@twistedmatrix.com> writes: Glyph> On Jan 5, 2010, at 4:53 PM, Terry Jones wrote:
Glyph> I understand what you're saying: you're interested in a subset of Glyph> what I'm interested in, here. Yes. Glyph> The point I'm trying to make is that once you've gone to the trouble Glyph> of providing an API for *clients* of an operation to declare that Glyph> they are no longer interested in its results, then it's wasteful for Glyph> the underlying engine to continue executing the operation only to Glyph> discard its result. Yes, agreed. But it's not at all clear to me how one should go about stopping things in general. That's a much harder task than ignoring a result or triggering a call/errback result yourself. My class just takes a couple of small client-side pieces out of the way. I know you know all this, I'm just trying to be clear for others / for the record, etc. Glyph> I think that coming up with a good API and semantics for "I no Glyph> longer care about the result here" has a huge amount of overlap with Glyph> this anyway. Hmmmm. I'm not sure about that. I agree if you s/huge amount/tiny/ :-) Naively perhaps, it seems like the client declaring they're no longer interested in a result is just a single bit of information. My class sits between the chain of steps (callbacks) leading to the production of a result and the chain of steps (callbacks) involved client-side in post-processing the result. It's a nice/simple way for the client to snip the overall chain. If I were going to try to push it further, I would start thinking about how to (figuratively) propagate this bit of information ("your result is no longer relevant / needed" back up the chain of as-yet unfired callbacks. If callbacks had access to that information, they could act accordingly. I don't think the running of the callback chain should be interrupted. As a small example, one callback function might do some setup operation (let's say open a file, or print an opening HTML tag) that a later one makes sure is wrapped up, using addBoth. It seems like there are two things that it would be nice to have: a way for a callback to know that its result will ultimately be ignored, and a mechanism for a callback to register a function to be called if an operation is canceled while it is running. In the case that an operation was canceled, its current callback would have its cancel function (if any) called, and subsequent callbacks would all find the resultWillBeIgnored to be True. Something like that. I'm happy to go into that discussion too, if you want. I do have a use for it as well (see below). Glyph> I grant that it may well be easier to implement without worrying Glyph> about the underlying operation though, and the semantics you've Glyph> defined by explicitly ignoring the actually-stopping case are much Glyph> simpler. Yes, agreed. I like the fact that the class is simple and that it deals with the client-side issues, allowing ignoring, timing out, early firing, etc. As you say, the much harder problem remains. But the harder problem is a bit less messy now (at least in my mind): it's "just" cancellation. Responsibilities are cleanly divided by my class - the client takes care of itself, and cancellation has *only* to deal with callbacks placed on a deferred that was generated by what the client called. Looked at from this POV, an approach to cancellation would be for code that is able to cancel operations it has begun to also provide a cancel method. One way to think about doing this would be to have the cancel method take a deferred as an argument. Something like my class could then hand the deferred back, effectively saying "my client is no longer interested in this deferred. You can call/errback it, or not, it makes no difference to us". If you've done that once, you can do it multiple times - by which I mean that I might write code that's a client of getPage, and getPage is a client of XXX, and XXX is a client of YYY, etc. Each could in turn pass the deferred it got back to the thing that created it. If there's no cancel method, then that's as far as can be gone with canceling. At that point the result is no longer passed because the first ControllableDeferred instance that's involved will effectively snip the link (or send an early result) in the sequence of steps that would originally have been done. Deferred producing code that's capable of cancellation might simply keep a dictionary of outstanding deferreds, and itself use a callback to remove things from its dictionary once a result is calculated and about to be passed on. My resizable dispatch queue code uses that approach. Or it could be more sophisticated and be a state machine, that transitions to new states as the callbacks that it put onto the deferred are called. Done this way, the business of cancellation is not handled by the Deferred class at all, which I think is a good thing because Deferreds are simple and don't have any idea of the operational connections between the functions in their call/errback chains. But the code that obtained the deferred and put callbacks onto it does. So, concretely, you could imagine something like this: - a getPageFactory class with a getPage method that returns a deferred, and a cancel method that accepts a deferred. - I'd add a cancelFunc argument (default None) to my ControllableDeferred class. If not None, that gets called with a deferred from getPage in case its client decides to ignore the operation or get an early result. That's pretty simple, I think. And it keeps all code for doing cancellation out of the Deferred class. Any client code that wants to be able to control the deferreds its getting can use a ControllableDeferred. And any code that produces deferreds and wants to offer the possibility of cancellation of underway operations can provide / advertise some kind of cancel function. That keeps cancellation split into logical pieces, wherein each piece doesn't know or care where its deferred came and safely knows that its result will be ignored. The important piece of the overall operation that's provided by ControllableDeferred is that it keeps all these pieces separate, only passing on results when they're ready. We'd generalize some of this, use an interface, etc. Glyph> But that also means that you still have to go to the trouble of Glyph> figuring out when you're no longer interested in the result any Glyph> more, but after going to the trobule ... what's the benefit? It's partly as you said in an earlier reply: releasing resources. It's also so that the client code can move on (see below). Glyph> I know what the use-cases are for stopping the underlying operation Glyph> (notifying the peer that you're not going to accept it, reclaiming Glyph> the resources); but if you're just going to let the operation Glyph> complete eventually anyway, why wouldn't you want to just finish Glyph> processing the result when it arrives regardless? Because it may not arrive at all, or it may arrive at a point when I'm no longer able to deal with a result. And because the non-arrival might be holding up some other part of a system (thereby consuming a resource). In the case I'm dealing with right now, I'm using my resizable dispatch queue code to process jobs that are submitted via a web form. To be more concrete, a user of my app can enter a Twitter user name and my code then goes and does a bunch of work. The processing of that Twitter user is a "job" here, and it may involve thousands of network API calls, to Twitter, to FluidDB, to other services. Because processing a job can create a lot of traffic, I use my dispatch queue class to keep a limit on the number of jobs that are being done in parallel. So there are a finite number of slots available for processing jobs. I know when a job is done because a deferred fires, and one of the callbacks in that deferred takes the job out of the list of currently outstanding jobs. It all works nicely. But the queue code just fires jobs and doesn't have any idea what they do. If they never fire, it continues to think the job is underway. This is generally true for any client that gets given a deferred - your continued processing is in the hands of the thing that made the deferred. In my case, a slot in a finite table is occupied. If I want to give up on it, I need to do some clean-up and may therefore not want the callbacks I added to the deferred to ever be called (e.g., data structs that could have processed the result may be changed or gone). I.e., I want my client to have more control over how / when / if its callbacks are called. That's totally not under my control if I call an unknown black-box deferred-producing code that may or may not call me back. In my particular case, it seems like I sometimes make a call to the Twitter API and that call never completes. I don't know what's going on, I suspect the connection is made and then held open and never processed by Twitter. So a deferred, or multiple deferreds, simply don't fire. I want to give up on them, clean up, and move on. From the client POV I don't know or care what went wrong with the call, I just know that it's time to do something else. I'm also interested in the wider problem (cancellation), because I'd like to tell a job to cancel itself. In my case this could in part take the form of a test inside a generator that was yielding deferreds to process the Twitter user. That would be a simple way to cancel the outstanding work for a job, even if I still couldn't reach out to the code that gave me a deferred to tell it to lose a connection etc. I hope that makes sense. OK, sorry for so many words. I hope this seems like it's heading in a useful direction. It does to me. Terry
On Jan 6, 2010, at 7:09 AM, Terry Jones wrote:
"Glyph" == Glyph Lefkowitz <glyph@twistedmatrix.com> writes: Glyph> On Jan 5, 2010, at 4:53 PM, Terry Jones wrote:
Glyph> I understand what you're saying: you're interested in a subset of Glyph> what I'm interested in, here.
Yes.
Glyph> I think that coming up with a good API and semantics for "I no Glyph> longer care about the result here" has a huge amount of overlap with Glyph> this anyway.
Hmmmm. I'm not sure about that. I agree if you s/huge amount/tiny/ :-)
Okay, "a huge amount" was not usefully descriptive :). What I mean is, there are a lot of weird little edge-cases in how multiple layers of the stack interact when they're dealing with a shared Deferred, and if we're However, upon further inspection I think that they key distinction between what you've proposed and what I'm talking about is the distinction between cancelling *one* layer of the callback chain and cancelling *all* layers of the callback chain. Your description (elided for brevity's sake) was very helpful. You've got resources which your callbacks are consuming by way of being "currently outstanding", and you want to be able to free *those* resources, without necessarily worrying about
Yes, agreed. I like the fact that the class is simple and that it deals with the client-side issues, allowing ignoring, timing out, early firing, etc. As you say, the much harder problem remains. But the harder problem is a bit less messy now (at least in my mind): it's "just" cancellation. Responsibilities are cleanly divided by my class - the client takes care of itself, and cancellation has *only* to deal with callbacks placed on a deferred that was generated by what the client called.
I don't think that you can completely separate the problems. You seem to have a reasonable solution to the problem of one layer of the Deferred stack, but once you're trying to deal with multiple layers of the stack at once, interactions occur which can be difficult to reconcile with the same API, many of which are already documented in the ticket's discussion.
Looked at from this POV, an approach to cancellation would be for code that is able to cancel operations it has begun to also provide a cancel method. One way to think about doing this would be to have the cancel method take a deferred as an argument.
This is a *very* interesting idea, although I don't like the API that you propose for it. By separating the cancel method from the Deferred itself, you remove the ability for a trivial client of that Deferred to say "forget about it" without also maintaining a reference to the thing that gave it the Deferred in the first place. That means that you need a new 'operation' API, and your code needs to take twice as many parameters, and it generally gets ugly.
Something like my class could then hand the deferred back, effectively saying "my client is no longer interested in this deferred. You can call/errback it, or not, it makes no difference to us". If you've done that once, you can do it multiple times - by which I mean that I might write code that's a client of getPage, and getPage is a client of XXX, and XXX is a client of YYY, etc. Each could in turn pass the deferred it got back to the thing that created it.
This implies, to me, that the cancellation callback would be better passed to addCallbacks(): effectively creating a third callback chain going from invoker to responder rather than the other way 'round as callbacks and errbacks do. I have stumbled in the direction of this thought a few times already but this is the first time I've had a really clear grasp of how it would work. Now I can see that each layer of the stack may have its own resources that it might want to clean up... previously I thought this could be done entirely with errbacks, but in this version, it doesn't matter if the base deferred doesn't know how to kick off the errback chain: all the resources on the *rest* of the callback chain can be cleaned up. I'm going to need to figure out some good values for XXX and YYY here in order to truly dispel the fog, though. The examples you provided are good but I still don't have a good feel for what might be a good general description of what resources could be used in different parts of the callback chain.
If there's no cancel method, then that's as far as can be gone with canceling.
This is one of the really tricky issues that has faced this feature all along: what happens when some part of the chain involved doesn't know what to do with a canceller? And your solution here seems like it may be a very elegant hack: do exactly the same thing as other parts of the callback chain. What I mean is: currently, if a particular callback pair doesn't have a callback or an errback, the behavior is to do nothing and pass the result through. Cancellation could do exactly the same thing!
At that point the result is no longer passed because the first ControllableDeferred instance that's involved will effectively snip the link (or send an early result) in the sequence of steps that would originally have been done.
Severing the link seems like a problem though; if we do that, then introducing any non-cancellation-aware Deferred - or callback, for that matter - into a cancellation-aware pipeline will prevent cancellations from propagating further up, and there should be no reason to do that.
And it keeps all code for doing cancellation out of the Deferred class.
Why is it that you want to keep the cancellation code out of Deferred? It seems very useful to me to have one object that you can say "stop" to, without necessarily knowing what's going on above it or where it came from.
OK, sorry for so many words. I hope this seems like it's heading in a useful direction. It does to me.
Yes, this has been very useful. I hope we can distill this into some useful conclusions soon. :)
participants (2)
-
Glyph Lefkowitz
-
Terry Jones