"Glyph" == Glyph Lefkowitz <glyph@twistedmatrix.com> writes: Glyph> On Jan 5, 2010, at 4:53 PM, Terry Jones wrote:
Glyph> I understand what you're saying: you're interested in a subset of Glyph> what I'm interested in, here. Yes. Glyph> The point I'm trying to make is that once you've gone to the trouble Glyph> of providing an API for *clients* of an operation to declare that Glyph> they are no longer interested in its results, then it's wasteful for Glyph> the underlying engine to continue executing the operation only to Glyph> discard its result. Yes, agreed. But it's not at all clear to me how one should go about stopping things in general. That's a much harder task than ignoring a result or triggering a call/errback result yourself. My class just takes a couple of small client-side pieces out of the way. I know you know all this, I'm just trying to be clear for others / for the record, etc. Glyph> I think that coming up with a good API and semantics for "I no Glyph> longer care about the result here" has a huge amount of overlap with Glyph> this anyway. Hmmmm. I'm not sure about that. I agree if you s/huge amount/tiny/ :-) Naively perhaps, it seems like the client declaring they're no longer interested in a result is just a single bit of information. My class sits between the chain of steps (callbacks) leading to the production of a result and the chain of steps (callbacks) involved client-side in post-processing the result. It's a nice/simple way for the client to snip the overall chain. If I were going to try to push it further, I would start thinking about how to (figuratively) propagate this bit of information ("your result is no longer relevant / needed" back up the chain of as-yet unfired callbacks. If callbacks had access to that information, they could act accordingly. I don't think the running of the callback chain should be interrupted. As a small example, one callback function might do some setup operation (let's say open a file, or print an opening HTML tag) that a later one makes sure is wrapped up, using addBoth. It seems like there are two things that it would be nice to have: a way for a callback to know that its result will ultimately be ignored, and a mechanism for a callback to register a function to be called if an operation is canceled while it is running. In the case that an operation was canceled, its current callback would have its cancel function (if any) called, and subsequent callbacks would all find the resultWillBeIgnored to be True. Something like that. I'm happy to go into that discussion too, if you want. I do have a use for it as well (see below). Glyph> I grant that it may well be easier to implement without worrying Glyph> about the underlying operation though, and the semantics you've Glyph> defined by explicitly ignoring the actually-stopping case are much Glyph> simpler. Yes, agreed. I like the fact that the class is simple and that it deals with the client-side issues, allowing ignoring, timing out, early firing, etc. As you say, the much harder problem remains. But the harder problem is a bit less messy now (at least in my mind): it's "just" cancellation. Responsibilities are cleanly divided by my class - the client takes care of itself, and cancellation has *only* to deal with callbacks placed on a deferred that was generated by what the client called. Looked at from this POV, an approach to cancellation would be for code that is able to cancel operations it has begun to also provide a cancel method. One way to think about doing this would be to have the cancel method take a deferred as an argument. Something like my class could then hand the deferred back, effectively saying "my client is no longer interested in this deferred. You can call/errback it, or not, it makes no difference to us". If you've done that once, you can do it multiple times - by which I mean that I might write code that's a client of getPage, and getPage is a client of XXX, and XXX is a client of YYY, etc. Each could in turn pass the deferred it got back to the thing that created it. If there's no cancel method, then that's as far as can be gone with canceling. At that point the result is no longer passed because the first ControllableDeferred instance that's involved will effectively snip the link (or send an early result) in the sequence of steps that would originally have been done. Deferred producing code that's capable of cancellation might simply keep a dictionary of outstanding deferreds, and itself use a callback to remove things from its dictionary once a result is calculated and about to be passed on. My resizable dispatch queue code uses that approach. Or it could be more sophisticated and be a state machine, that transitions to new states as the callbacks that it put onto the deferred are called. Done this way, the business of cancellation is not handled by the Deferred class at all, which I think is a good thing because Deferreds are simple and don't have any idea of the operational connections between the functions in their call/errback chains. But the code that obtained the deferred and put callbacks onto it does. So, concretely, you could imagine something like this: - a getPageFactory class with a getPage method that returns a deferred, and a cancel method that accepts a deferred. - I'd add a cancelFunc argument (default None) to my ControllableDeferred class. If not None, that gets called with a deferred from getPage in case its client decides to ignore the operation or get an early result. That's pretty simple, I think. And it keeps all code for doing cancellation out of the Deferred class. Any client code that wants to be able to control the deferreds its getting can use a ControllableDeferred. And any code that produces deferreds and wants to offer the possibility of cancellation of underway operations can provide / advertise some kind of cancel function. That keeps cancellation split into logical pieces, wherein each piece doesn't know or care where its deferred came and safely knows that its result will be ignored. The important piece of the overall operation that's provided by ControllableDeferred is that it keeps all these pieces separate, only passing on results when they're ready. We'd generalize some of this, use an interface, etc. Glyph> But that also means that you still have to go to the trouble of Glyph> figuring out when you're no longer interested in the result any Glyph> more, but after going to the trobule ... what's the benefit? It's partly as you said in an earlier reply: releasing resources. It's also so that the client code can move on (see below). Glyph> I know what the use-cases are for stopping the underlying operation Glyph> (notifying the peer that you're not going to accept it, reclaiming Glyph> the resources); but if you're just going to let the operation Glyph> complete eventually anyway, why wouldn't you want to just finish Glyph> processing the result when it arrives regardless? Because it may not arrive at all, or it may arrive at a point when I'm no longer able to deal with a result. And because the non-arrival might be holding up some other part of a system (thereby consuming a resource). In the case I'm dealing with right now, I'm using my resizable dispatch queue code to process jobs that are submitted via a web form. To be more concrete, a user of my app can enter a Twitter user name and my code then goes and does a bunch of work. The processing of that Twitter user is a "job" here, and it may involve thousands of network API calls, to Twitter, to FluidDB, to other services. Because processing a job can create a lot of traffic, I use my dispatch queue class to keep a limit on the number of jobs that are being done in parallel. So there are a finite number of slots available for processing jobs. I know when a job is done because a deferred fires, and one of the callbacks in that deferred takes the job out of the list of currently outstanding jobs. It all works nicely. But the queue code just fires jobs and doesn't have any idea what they do. If they never fire, it continues to think the job is underway. This is generally true for any client that gets given a deferred - your continued processing is in the hands of the thing that made the deferred. In my case, a slot in a finite table is occupied. If I want to give up on it, I need to do some clean-up and may therefore not want the callbacks I added to the deferred to ever be called (e.g., data structs that could have processed the result may be changed or gone). I.e., I want my client to have more control over how / when / if its callbacks are called. That's totally not under my control if I call an unknown black-box deferred-producing code that may or may not call me back. In my particular case, it seems like I sometimes make a call to the Twitter API and that call never completes. I don't know what's going on, I suspect the connection is made and then held open and never processed by Twitter. So a deferred, or multiple deferreds, simply don't fire. I want to give up on them, clean up, and move on. From the client POV I don't know or care what went wrong with the call, I just know that it's time to do something else. I'm also interested in the wider problem (cancellation), because I'd like to tell a job to cancel itself. In my case this could in part take the form of a test inside a generator that was yielding deferreds to process the Twitter user. That would be a simple way to cancel the outstanding work for a job, even if I still couldn't reach out to the code that gave me a deferred to tell it to lose a connection etc. I hope that makes sense. OK, sorry for so many words. I hope this seems like it's heading in a useful direction. It does to me. Terry