On 01:20 pm, descentspb@gmail.com wrote:
Hello!
I am a newbie in twisted, sorry if my question sounds awkward.
I have written a pretty simple recursive page downloader, which parses an html, extracts all the needed links from it, and starts dowloading them. The links are the videofiles, so they are pretty large. The problem is, that the downloader works TOO FAST :) I want to set something like the global bandwidth limit or the maximum limit of concurrently downloading files.
I am using the twisted.web.client.downloadPage to download the files and using the Deferred, that it returns. I can't understand how to make it still return a Deferred, corresponding to that file, but not start downloading right away, but instead start downloading it on some kind of event (make a manger-like wrapper for that function).
So I want the code to still look simple like this:
for link in links: d = downloadPage_limited(link, filename)
And the wrapper(function downloadPage_limited) will manage the amount of concurrent downloads, and will still return the Deferred, which will be returned by twisted.web.client.downloadPage.
Is my idea about a "wrapper" practical and what's the general way to write it? On which event is it better to decrement the counter of the amount currently downloading files?
Yes, that's a good idea. You might be able to use twisted.internet.defer.DeferredSemaphore to handle all of the counting for you. For example, from twisted.internet.defer import DeferredSemaphore from twisted.web.client import downloadPage class LimitedDownloader: def __init__(self, howMany): self._semaphore = DeferredSemaphore(howMany) def downloadPage(self, *a, **kw): return self._semaphore.run(downloadPage, *a, **kw) downloader = LimitedDownloader(3) downloader.downloadPage(...) In this example, DeferredSemaphore.run will only let 3 downloadPage calls run concurrently. If a 4th is attempted before any earlier ones finish, it won't actually be called until one of the earlier ones does finish, and then it will be called. Jean-Paul