On Tue, Oct 6, 2009 at 10:40 PM, Steve Steiner (listsin) <listsin@integrateddevcorp.com> wrote:
 
       Should I limit the number of "in-flight" pages?

I'm not going to comment on that, because I don't know what your app is doing or why it appears to be dying.  As you said, you didn't post code :).

However, you can experiment with it pretty easily using DeferredSemaphore: http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.defer.DeferredSemaphore.html
 
       Currently, I'm scanning sites that have upwards of 5000 pages and it
seems that, when I get too many deferred's in flight, the app
*appears* to crash.

       I'm not sure whether it's actually going out to lunch or just appears
that way and, before I go instrumenting the app to death, can anyone
tell me whether there is some sort of practical limit to how many "in-
flight" deferreds might start to cause issues, just due to the sheer
number?

If your app is doing something strange that you don't understand, you should instrument it until you understand it.  Regardless of any practical advice you may receive as a temporary stopgap, there's always a chance that something else is going wrong, and by reducing the number of concurrent requests you're just decreasing its likelihood rather than properly fixing it.

It's highly unlikely that it's actually the number of Deferreds.  A Deferred is just a Python object, so if you've got the RAM to store them and their associated callbacks, you should be fine.  It's more likely that it has something to do with long callback chains, or hitting some kind of file-descriptor limit (what version of Twisted are you using?) or perhaps that 5000 pages is just a lot of pages to request and you might need to wait a while.

Good luck,

-Glyph