[Twisted-Python] Questions about the very nice AMP protocol
Hi, I currently use Perspective Broker for a number of projects. As time has gone by I have really come to appreciate having a full two-way network protocol. But, my company has lots of Java programmers and they do lots of "serious" (read, pain in the ass) web services and grid services stuff. I would like to be able to get the many Python things I have playing nicely with the many Java things floating around here. Thus, AMP is extremely attractive. There is one problem that I have though. We do high performance scientific computing and deal with extremely large tera/peta-byte data sets. Thus we need network protocols that can send large amounts of data around. The focus of AMP of small messages thus presents a problem. There are really two usage cases that I have in mind: 1. Sending larger (maybe 100's of Mb) objects around that do fit in memory. These can be serialized easily (w/o creating a big pickle), but I need to make sure that Twisted doesn't make extra copies of them during the transfer. 2. Sending even bigger things that don't fit into memory. Any thoughts on the best way to address these questions using AMP. Here are my thoughts: 1. Use a multi-connection approach like FTP does. Use AMP for control and the other connection for the binary data. It would be easy to use producers/consumers in this channel to handle the large data problems above. I don't like this because I often need to ssh tunnel the protocol through firewalls - two connections is unpleasant. 2. Use AMP's inner protocol to run two protocols simultaneously. My understanding is that AMP doesn't support switching back and forth between AMP and its inner protocol. Would it be crazy to try this approach? 3. Try to modify AMP itself to handle the large objects itself by registering Producers with the underlying transport. It may sound like I just want something like FTP, but I also need to send lots application specific control messages as well - and these really need to be two way. Any thoughts would be greatly appreciated. Brian
On Thu, 16 Nov 2006 23:23:43 -0700, Brian Granger <ellisonbg.net@gmail.com> wrote:
Hi,
I currently use Perspective Broker for a number of projects. As time has gone by I have really come to appreciate having a full two-way network protocol. But, my company has lots of Java programmers and they do lots of "serious" (read, pain in the ass) web services and grid services stuff. I would like to be able to get the many Python things I have playing nicely with the many Java things floating around here.
Thus, AMP is extremely attractive. There is one problem that I have though. We do high performance scientific computing and deal with extremely large tera/peta-byte data sets. Thus we need network protocols that can send large amounts of data around. The focus of AMP of small messages thus presents a problem. There are really two usage cases that I have in mind:
1. Sending larger (maybe 100's of Mb) objects around that do fit in memory. These can be serialized easily (w/o creating a big pickle), but I need to make sure that Twisted doesn't make extra copies of them during the transfer.
2. Sending even bigger things that don't fit into memory.
Any thoughts on the best way to address these questions using AMP. Here are my thoughts:
1. Use a multi-connection approach like FTP does. Use AMP for control and the other connection for the binary data. It would be easy to use producers/consumers in this channel to handle the large data problems above. I don't like this because I often need to ssh tunnel the protocol through firewalls - two connections is unpleasant.
2. Use AMP's inner protocol to run two protocols simultaneously. My understanding is that AMP doesn't support switching back and forth between AMP and its inner protocol. Would it be crazy to try this approach?
3. Try to modify AMP itself to handle the large objects itself by registering Producers with the underlying transport.
This is the planned direction for AMP, if that makes any difference to your plans. Jean-Paul
It may sound like I just want something like FTP, but I also need to send lots application specific control messages as well - and these really need to be two way.
Any thoughts would be greatly appreciated.
Brian
Jean-Paul
On Fri, 17 Nov 2006 08:32:26 -0500, Jean-Paul Calderone <exarkun@divmod.com> wrote:
On Thu, 16 Nov 2006 23:23:43 -0700, Brian Granger <ellisonbg.net@gmail.com>
1. Use a multi-connection approach like FTP does. Use AMP for control and the other connection for the binary data. It would be easy to use producers/consumers in this channel to handle the large data problems above. I don't like this because I often need to ssh tunnel the protocol through firewalls - two connections is unpleasant.
HTTP is a protocol quite suited to this. I believe we already have someone using AMP in a grid computing environment where instead of large amounts of data, a http:// url pointing to the large amount of data is provided. I think the best way to do this is to describe a series of urls (in the case where you have multiple mirrors), a checksum and perhaps an identifier. The client then knows it can fire up a twisted.web.client.downloadPage() and have the data when the Deferred fires. If you need to serve the data dynamically, then you can use twisted.web / twisted.web2 quite easily, or you can just put the files in a directory tree available to your favourite HTTP server.
2. Use AMP's inner protocol to run two protocols simultaneously. My understanding is that AMP doesn't support switching back and forth between AMP and its inner protocol. Would it be crazy to try this approach?
Not a wonderful idea. I've seen this approach implemented, and it results in much of the freedom that amp provides being limited.
3. Try to modify AMP itself to handle the large objects itself by registering Producers with the underlying transport.
This is the planned direction for AMP, if that makes any difference to your plans.
I would love to see this implemented in a sane fashion. I've talked to Glyph about it, and it seems like the structure of amp will quite easily accomodate it. We've even discussed things like caching amp proxies. No code yet as far as I've seen. Stephen.
participants (3)
-
Brian Granger
-
Jean-Paul Calderone
-
Stephen Thorne