On Tue, Oct 6, 2009 at 10:40 PM, Steve Steiner (listsin) <listsin@integrateddevcorp.com> wrote:

So, I have a situation...

I have an application whose basic function is, in simplified form:

def main():
get_web_page(main_page_from_params)

def get_web_page(page_name):
set up a page getter deferred,
one of the callbacks gets the links out of the page and sends them
to get_them()

def get_them(links):
for l in links:
if l is not being gotten or hasn't been got:
deferred = get_web_page(l)

In other words, I feed in the top level page, then recursively feed
in any pages linked to by the current page, and they feed in all their
links, until all pages are gotten.

I understand the concurrency issues with multiple deferred's trying
to add pages to the "get list" -- it's properly handled in the code
(far as I can tell, so far).

So, here's the question...

I have a "pages" list containing all of the pages.

They are set to either gotten or in-flight.

In-flight means I have a deferred that's going to go get it (in
get_web_page()).

IOW, right now, if I don't already have the page, and I have a link
to it, I just start a deferred to go get it.

Should I limit the number of "in-flight" pages?

Currently, I'm scanning sites that have upwards of 5000 pages and it
seems that, when I get too many deferred's in flight, the app
*appears* to crash.

I'm not sure whether it's actually going out to lunch or just appears
that way and, before I go instrumenting the app to death, can anyone
tell me whether there is some sort of practical limit to how many "in-
flight" deferreds might start to cause issues, just due to the sheer
number?

Thanks for any insight on this that anyone might offer.

I expect the usual flurry of "you must post your exact code or we
can't help you at all, moron" posts, but...

In spite of my not having posted specific code, could someone with
some actual experience in this please give me a clue, within an order
of magnitude, how many deferreds might start to cause real trouble?

Thanks,

S

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python