[Twisted-Python] Design pattern for multi-stage web searches

Hello, I am a Google Summer of Code student working with the Zope 3/Grok community this year. I wonder if there is an established design pattern or examples that use Twisted to fetch data via HTTP when the process can take a variable number of requests to be completed. For instance, if you search for book records online, some sites allow you to do an ISBN search which gives back the full book details, while others will present an intermediate results page from which links to the book details page can be obtained (many sites use a internal id for their products, including books). Still others can return an intermediate page containing a link to a different edition (!= ISBN), when the given ISBN refers to an out-of-print edition. So for each different metadata source I'd like to provide a chain of callbacks to be processed. Anyone knows of references that may help me? I browsed the GoF patterns but none of them fits this use case. BTW, the resulting code will be open source under the ZPL and hosted at http://svn.zope.org. Thanks for a great piece of software! Regards, Luciano PS. I've already implemented, using Twisted, a prototype of the collector for Amazon.com and it's affiliates in Europe and Japan. But I am Brazilian and I know their catalog is not very complete outside of the markets where they operate. To be really international, a book metadata collector must be pluggable to allow localized searches depending on the ISBN prefix.

On Sun, 5 Aug 2007 16:39:00 -0300, Luciano Ramalho <luciano@ramalho.org> wrote:
Hello,
I am a Google Summer of Code student working with the Zope 3/Grok community this year.
I wonder if there is an established design pattern or examples that use Twisted to fetch data via HTTP when the process can take a variable number of requests to be completed.
This is usually done by "chaining" Deferreds together. When a callback or errback function on Deferred A returns Deferred B, B is chained to a: B.chainDeferred(A) or B.addCallbacks(A.callback, A.errback) So if you fetch a page and it turns out not to be the one you ultimately want, a callback on that Deferred can initiate another fetch and return the Deferred for that operation. This is transparent to the caller of the original function: their callbacks are only called once all of the earlier callbacks have returned a None-Deferred value and all returned Deferreds have fired. I doubt I've explained this particularly well, so you might want to take a look at http://twistedmatrix.com/projects/core/documentation/howto/defer.html to really get an understanding. Jean-Paul

Thank you very much for the reply, Jean-Paul. I'll study it and probably make changes to my code. It seems the chaining mechanism you explain can improve the generality of my code and at the same time simplify the implementation. If anyone starting with Twisted wants to see an example of using several deferreds, the code I wrote is here: http://svn.zope.org/Sandbox/luciano/kirbi/kirbifetch/src/kirbifetch/ The file which uses the deferreds is fetch.py. Not that the code above is a work-in-progress, tests are not automated yet, and the style of chaining Jean-Paul suggested is not yet implemented: currenlty each callback explicitly initiates another deferred, without chaining. Anyway, I hope my code is useful as an example. Regards, Luciano On 8/6/07, Jean-Paul Calderone <exarkun@divmod.com> wrote:
On Sun, 5 Aug 2007 16:39:00 -0300, Luciano Ramalho <luciano@ramalho.org> wrote:
Hello,
I am a Google Summer of Code student working with the Zope 3/Grok community this year.
I wonder if there is an established design pattern or examples that use Twisted to fetch data via HTTP when the process can take a variable number of requests to be completed.
This is usually done by "chaining" Deferreds together. When a callback or errback function on Deferred A returns Deferred B, B is chained to a:
B.chainDeferred(A)
or
B.addCallbacks(A.callback, A.errback)
So if you fetch a page and it turns out not to be the one you ultimately want, a callback on that Deferred can initiate another fetch and return the Deferred for that operation. This is transparent to the caller of the original function: their callbacks are only called once all of the earlier callbacks have returned a None-Deferred value and all returned Deferreds have fired.
I doubt I've explained this particularly well, so you might want to take a look at http://twistedmatrix.com/projects/core/documentation/howto/defer.html to really get an understanding.
Jean-Paul
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (2)
-
Jean-Paul Calderone
-
Luciano Ramalho