[Twisted-Python] deferToThread - supported alternative to the (deprecated) setTimeout method
I have a program where I think that what I want to use is deferToThread and setTimeout. However, the setTimeout method is marked as deprecated - but I can't find a good supported alternative. Can anyone help? The situation I have is: I need to do a large-ish number (50-100) of blocking calls in parallel, and collect the results or any errors. In itself, this seems like a suitable use for deferToThread (defer each call, and collect the results in the deferred callback/errback). The problem is that in rare cases, the blocking call can block indefinitely. In this case, I need to make the call time out. However, the underlying API offers no way to time out the call, so I have to do this externally. A quick prototype seems to work OK, using setTimeout to force a timeout on the deferred, but it generates deprecation warnings for the setTimeout call. What should I be using to achieve this effect? Thanks, Paul. -- I remember being impressed with Ada because you could write an infinite loop without a faked up condition. The idea being that in Ada the typical infinite loop would normally be terminated by detonation. -- Larry Wall
Hi Paul, On Tue, 23 May 2006 15:49:23 -0500, Paul Moore <pf_moore@yahoo.co.uk> wrote:
I have a program where I think that what I want to use is deferToThread and setTimeout. However, the setTimeout method is marked as deprecated - but I can't find a good supported alternative. Can anyone help?
I don't know for a certainty that I can help, but I'll try :)
The situation I have is:
I need to do a large-ish number (50-100) of blocking calls in parallel, and collect the results or any errors. In itself, this seems like a suitable use for deferToThread (defer each call, and collect the results in the deferred callback/errback).
Agreed, this seems to be a perfectly reasonable use-case for deferToThread.
The problem is that in rare cases, the blocking call can block indefinitely. In this case, I need to make the call time out. However, the underlying API offers no way to time out the call, so I have to do this externally.
Alas, when dealing with APIs that do not allow for timeouts, there is very little that twisted can do to help. The use of setTimeout has been debated to death, and no magic bullet solution is in sight. That said, please read the following discussion thread to see what your options are, and perhaps one of them will be acceptable for your particular situation. These discussions cover your issue in reasonable depth. http://twistedmatrix.com/pipermail/twisted-python/2004-April/007531.html http://twistedmatrix.com/trac/ticket/178 http://twistedmatrix.com/pipermail/twisted-python/2005-March/009716.html Basically, the best you can do in your circumstances is use reactor.callLater to invoke some function/method that will take action appropriate to your situation, be that cancelling the Deferred, killing the thread, etc. Hope this helps, L. Daniel Burr
On Wed, 24 May 2006 06:49:23 +1000, Paul Moore <pf_moore@yahoo.co.uk> wrote:
I have a program where I think that what I want to use is deferToThread and setTimeout. However, the setTimeout method is marked as deprecated - but I can't find a good supported alternative. Can anyone help?
You can just use reactor.callLater(...) to run whatever timeout code you have after the given number of seconds. callLater returns a DelayedCall instance which you can .cancel() if you get a result before the timeout.
The situation I have is:
I need to do a large-ish number (50-100) of blocking calls in parallel, and collect the results or any errors. In itself, this seems like a suitable use for deferToThread (defer each call, and collect the results in the deferred callback/errback).
The problem is that in rare cases, the blocking call can block indefinitely. In this case, I need to make the call time out. However, the underlying API offers no way to time out the call, so I have to do this externally.
Well this is a bit of a problem. You can't just kill a thread - so it will eventually return a result, and Twisted will .callback() its deferred. If you've already stepped in, due to a timeout, and .callback()'ed the same deferred you'll get an AlreadyCalledError in your log. So don't do that.. don't fire that deferred. When you time out just keep some state so that you can ignore the result if and when it comes.
A quick prototype seems to work OK, using setTimeout to force a timeout on the deferred, but it generates deprecation warnings for the setTimeout call.
What should I be using to achieve this effect?
Thanks, Paul.
Hope that helps. -- Eric Mangold - Twisted/Win32 Maintainer http://twistedmatrix.com/trac/wiki/Windows
On Tue, May 23, 2006 at 09:49:23PM +0100, Paul Moore wrote: [...]
The problem is that in rare cases, the blocking call can block indefinitely. In this case, I need to make the call time out. However, the underlying API offers no way to time out the call, so I have to do this externally.
There is a problem here Twisted cannot help you with -- you will leak threads that you cannot interrupt, because they are blocked indefinitely. setTimeout or other solutions will allow you to paper over this fact, but you need to be aware of it, because it will eventually stop your process from running. There's no way something like defer.setTimeout can magically cancel the underlying operation for you (even though it unblocks the deferred waiting on that operation), and part of the reason setTimeout is a deprecated is to force people to be aware of that. Also, deferToThread by default isn't going to give you with 50-100 parallel threads. The default threadpool size is 4. You'll need to use reactor.suggestThreadPoolSize to change that -- but realise that the default DNS resolver in current Twisted releases use that threadpool too, and perhaps so will other libraries, and you could starve those callers by swamping the threadpool with your blocking calls. So it may be better to use your own threadpool, as twisted.enterprise.adbapi does, for example. See twisted.python.threadpool. I don't suppose there's a non-blocking way to do what you want? Oh -- and remember that while you can't kill threads, you *can* kill processes. Consider using subprocesses to do your blocking work. -Andrew.
On Tue, 23 May 2006 21:49:23 +0100, Paul Moore <pf_moore@yahoo.co.uk> wrote:
I have a program where I think that what I want to use is deferToThread and setTimeout. However, the setTimeout method is marked as deprecated - but I can't find a good supported alternative. Can anyone help?
The situation I have is:
I need to do a large-ish number (50-100) of blocking calls in parallel, and collect the results or any errors. In itself, this seems like a suitable use for deferToThread (defer each call, and collect the results in the deferred callback/errback).
The problem is that in rare cases, the blocking call can block indefinitely. In this case, I need to make the call time out. However, the underlying API offers no way to time out the call, so I have to do this externally.
A quick prototype seems to work OK, using setTimeout to force a timeout on the deferred, but it generates deprecation warnings for the setTimeout call.
What should I be using to achieve this effect?
There's some code along these lines here: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L201 It might make a good example to work from. Note that any call that blocks indefinitely will hold on to a thread indefinitely, and thus reduce your effective thread pool size by one. If this happens enough, you'll end up with no free threads in your threadpool, and no other threaded tasks will ever be able to complete. Jean-Paul
Many thanks to all for the helpful responses. They've given me a lot of options to think about. I'll see where it takes me. Some specific comments: On Tue, 23 May 2006 22:43:35 -0400, Jean-Paul Calderone <exarkun@divmod.com> wrote:
There's some code along these lines here:
http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L201
Thanks, that was useful.
It might make a good example to work from. Note that any call that blocks indefinitely will hold on to a thread indefinitely, and thus reduce your effective thread pool size by one. If this happens enough, you'll end up with no free threads in your threadpool, and no other threaded tasks will ever be able to complete.
That's a good point. My initial thought was that I'm not writing a long-running process here (my process runs all the threads, does a bit of housekeeping when they complete, and then stops) and the blocking thread condition is rare, so it shouldn't be a problem. But that's just swapping one rare problem for another, somewhat rarer case. So maybe I'm better fixing it properly. On Wed, 24 May 2006 09:03:29 +1000, "Eric Mangold" <teratorn@twistedmatrix.com> wrote:
Well this is a bit of a problem. You can't just kill a thread - so it will eventually return a result, and Twisted will .callback() its deferred. If you've already stepped in, due to a timeout, and .callback()'ed the same deferred you'll get an AlreadyCalledError in your log.
Ah. That clarifies why setTimeout is deprecated, and why it's not appropriate for me. Thanks! On Wed, 24 May 2006 12:36:37 +1000, Andrew Bennetts <andrew-twisted@puzzling.org> wrote:
I don't suppose there's a non-blocking way to do what you want?
Sadly, no. It's a horribly broken API in this respect. (It's the Oracle database connection API, if you want to know - no timeout capability, and a rare but definite chance of a complete hang, no return *ever*).
Oh -- and remember that while you can't kill threads, you *can* kill processes. Consider using subprocesses to do your blocking work.
That's possible, but I'm running on Windows and subprocess management is not as manageable on that platform (even under Python/twisted). But it is a thought, and I'll keep it in mind. Paul. -- The trouble with being punctual is that nobody's there to appreciate it. -- Franklin P. Jones
On Fri, 26 May 2006 20:46:27 +0100, Paul Moore <pf_moore@yahoo.co.uk> wrote:
Consider using subprocesses to do your blocking work.
That's possible, but I'm running on Windows and subprocess management is not as manageable on that platform (even under Python/twisted). But it is a thought, and I'll keep it in mind.
While Twisted uses some rather gross APIs, and polling, to get subprocess management working on win32, and it's therefore not very high performance, it _does_ work, performs reasonably under average load (sockets are generally higher performance than stdin/stdout on all platforms, so if you have really large volumes of data to send to your subprocess you might want to use one of those anyway.), and is supported, especially in the most recent 2.4 release. (No win32 installers for that yet; you'll have to build it yourself from the tarball). Although the Python APIs differ substantially, the Twisted APIs for managing processes and communicating with them should be identical on Windows.
participants (6)
-
Andrew Bennetts
-
Eric Mangold
-
glyph@divmod.com
-
Jean-Paul Calderone
-
L. Daniel Burr
-
Paul Moore