Justin Mazzola Paluska wrote:
Hi,
I'm using PB in a distributed application that has suddenly grown the requirement to copy directories of files between the servers.
From lurking on the mailing list archives, it seems that the best way to move large amounts of data between Twisted servers is to use a twisted.spread.util.Pager sub-class to pipe the data. Between that information and the "How to use twisted pb pager" [1] document, I'm probably good to go on how to transfer large amounts of data.
In my experience, sending big files over PB takes way too much time. This is due to the serialization-deserialization process involved. Paging avoids blocking, which is good, but it still takes much more than sending the files as-is.
At the very least, optimize serialization by enable cBanana by uncommenting the lines 311-318 in twisted.spread.banana.py . Why are they commented?
http://twistedmatrix.com/pipermail/twisted-python/2004-December/009158.html
- Finally, should I be doing something completely different? Normally, outside of my application, I'd just use rsync, scp, or some such. However, the users of this application don't know how to use these tools. I can't spawn these programs without getting into authentication issues between the machines. Doing this within Twisted seems like a good idea because the machines are already authenticated to each other through PB, but I could be wrong.
You could send the files over an HTTP connection, avoiding the serialization overhead. Setting up HTTP clients and servers is very easy in Twisted, as you surely know.
[1] http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/457670