
Hi, Turtl is an HTTP proxy whose purpose is to throttle connections to specific hostnames to avoid breaking terms of usage of those API providers (like del.icio.us, technorati and so on). At the core of turtl is a throttling deferred that works in a similar way as DeferredSemaphore() except that it will enforce also a rate (N calls every M seconds) at which deferreds added to it are fired. In the past few weeks it's been improved a couple obscure bugs have been ironed out. It's been running as a proxy for a couple of years and recently we started using it as a crawler rate limiter. Source code lives on bitbucket: https://bitbucket.org/adroll/turtl/overview Here's a small example of its usage: import time from twisted.internet import reactor, defer from twisted.protocols.policies import WrappingFactory from twisted.web import client, server, resource from turtl import engine throttle = engine.ThrottlingDeferred(concurrency=1, calls=2, interval=1) class FakeResource(resource.Resource): isLeaf = True def render(self, request): return "hello" def setupServer(): site = server.Site(FakeResource()) wrapper = WrappingFactory(site) port = reactor.listenTCP(0, wrapper, interface="127.0.0.1") portno = port.getHost().port return portno def stop(_): return reactor.stop() def makeUrl(port): return "http://localhost:%s/" % (port) def prinl(page): print time.time(), page port = setupServer() url = makeUrl(port) defer.DeferredList([throttle.run(client.getPage, url).addBoth(prinl) for i in xrange(1000)]).addBoth(stop) reactor.run() -- Valentino Volonghi http://www.adroll.com