[Twisted-Python] FTP without the protocol
Hi, all, I have many users that need to send data to me for archival. Large data sets in many files (10,000's of files 100s of GB per file set.) Up to this point the users use a ftp client wrapper that uploads data to my FTP server. This has become too problematic with firewalls etc. and I now have the time to write a new client/server of my own; A ftp server for uploads only. I'm looking into using twisted, but am very open to other suggestions/solutions. For such a use case I'm surprised not to find an example. I think I just need a push in the right direction. Is producers/consumers the right approach? Thanks in advance, Lloyd
Lloyd Carothers <lloyd@passcal.nmt.edu> writes:
For such a use case I'm surprised not to find an example. I think I just need a push in the right direction. Is producers/consumers the right approach?
It's a bit dated at this point, but maybe this might spark some ideas: http://twistedmatrix.com/pipermail/twisted-python/2007-July/015738.html It's producer/consumer, which yes, is very efficient for streaming transfers. The code the post was based on is actually still in active use, but against an older twisted 2.5.0 installation, so I'm not sure how much tweaking it may need to adjust to the latest Twisted version. In practice this is paired with a separate set of code that implements a PB-based control channel over which the files to upload are negotiated. The binary transfer itself just has a small header in front of the data containing some authentication and size information, so it's derived from LineReceiver and switches to raw mode for the transfer. While I tend to prefer a separate control channel (though it certainly doesn't need to be PB based), you could also in-line the control information (so it becomes more like an HTTP transfer) if you wished. BTW, passive FTP should be quite firewall friendly, unless you're talking about really locked down locations where the only thing allowed out is HTTP or something. But if you're not using passive mode yet, that might also be a quicker fix to your existing code base. -- David
Lloyd Carothers <lloyd@passcal.nmt.edu> writes:
For such a use case I'm surprised not to find an example. I think I just need a push in the right direction. Is producers/consumers the right approach? It's a bit dated at this point, but maybe this might spark some ideas: http://twistedmatrix.com/pipermail/twisted-python/2007-July/015738.html
It's producer/consumer, which yes, is very efficient for streaming transfers. The code the post was based on is actually still in active use, but against an older twisted 2.5.0 installation, so I'm not sure how much tweaking it may need to adjust to the latest Twisted version. This looks pretty close to what I need and definitely a good starting
In practice this is paired with a separate set of code that implements a PB-based control channel over which the files to upload are negotiated. Would you be willing to share this as well? The binary transfer itself just has a small header in front of the data containing some authentication and size information, so it's derived from LineReceiver and switches to raw mode for the transfer. While I tend to prefer a separate control channel (though it certainly doesn't need to be PB based), you could also in-line the control information (so it becomes more like an HTTP transfer) if you wished. Gotcha. Maybe this is new since you wrote the above, but is FileSender the
BTW, passive FTP should be quite firewall friendly, unless you're talking about really locked down locations where the only thing allowed out is HTTP or something. But if you're not using passive mode yet, that might also be a quicker fix to your existing code base. Indeed, I use passive mode exclusively as clients often come from NATed nets. Generally FTP works ok but some organizations firewalls do strange
On 10/23/13 7:30 PM, David Bolen wrote: point for me. I can make use of session specific information too, which will be nice. As you've been using it for a while, have you had any issues. Is it robust/stable? producer to use here? Perhaps its not fully developed but should there not also be a compliment FileReceiver to consume the file and write it out? things with that traffic, and even the good connections seem to have sporadic drops which often aren't handled well, at least with proftp.
-- David
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Lloyd Carothers <lloyd@passcal.nmt.edu> writes:
This looks pretty close to what I need and definitely a good starting point for me. I can make use of session specific information too, which will be nice. As you've been using it for a while, have you had any issues. Is it robust/stable?
Absolutely robust/stable. Haven't touched the code since 2007 (well, aside from one tweak to deal with an application level race condition under long RTT connections unrelated to the file transfer itself). Looks like there have been around 60000 media file transfers totaling about 650GB of data so far.
Would you be willing to share this as well?
It's part of a larger system that I can't really share in total, but I could probably snip an extract of the PB part related to file I/O. It's not really anything special - just a typical PB published object with some remote methods. You'd have to be willing to deal with it in an "as-is" state though. If interested, drop me a note directly.
Gotcha. Maybe this is new since you wrote the above, but is FileSender the producer to use here?
That's the one I use on the client side, yes. On the receiver the raw data receipt is just part of the server side protocol definition.
Perhaps its not fully developed but should there not also be a compliment FileReceiver to consume the file and write it out?
Maybe. I don't think such a helper exists in the version of Twisted I was using, but it may also have been simpler enough that I didn't look too hard (plus I wanted a running crc calculation). I think it's also a little trickier on the receiving side than sending. On the sending side it's easier to delegate transmission of the data, while on the receiving side the data (and connection status) occurs as part of your protocol handler. So at the least something like FileReceiver would probably need to be a mix-in (taking over dataReceived in some way) rather than an independent function. The receiving side is also fairly trivial - in my case, it's not much more than the few lines in dataReceived.
Generally FTP works ok but some organizations firewalls do strange things with that traffic, and even the good connections seem to have sporadic drops which often aren't handled well, at least with proftp.
Yeah, it's an ugly network world out there :-) Note that with an FTP replacement, you'll still be responsible for handling disconnects and/or resuming uploads. In my current system all uploads go to a dedicated upload folder by job, which is moved to its final location during the job commit, or completely discarded if something goes wrong. But I have the luxury of having tight control of the client (who is an audio post-production house publishing jobs for clients to review), so failures between the client and the server in my case are actually reasonably rare. -- David
participants (2)
-
David Bolen
-
Lloyd Carothers