[Twisted-Python] troubles with FTP server and async file writing

Howdy all.. who's interested in the VFS or FTP code these days? I'm in the process of writing an FTP frontend for Tahoe[1], and the STOR command is causing me problems. Tahoe is a virtual filesystem that spreads files and directories over a grid of backend servers to achieve high reliability and availability. Once the STOR command finishes transferring the file (over the separate DTP connection), the Tahoe node springs into action: encrypting, hashing, erasure-coding, uploading, and finally modifying a parent directory (which is also distributed) to reference the new child. This whole process takes time, maybe a second or two for a small file, potentially minutes if you're uploading a large file over a slow network. Naturally, the Tahoe upload() method returns a Deferred, that fires when the whole process is complete. The trouble is, it looks like Twisted's FTP server is not designed to handle asynchronous uploads. The specific problem is in the way that twisted.protocols.ftp.FTP.ftp_STOR() interacts with the IConsumer that it gets.. the sequence goes like this: 1: ftp_STOR fetches an IWriteFile instance (via Deferred) 2: cbOpened() calls IWriteFile.receive to get an IConsumer (via Deferred) 3: cbConsumer() hands it to dtpInstance.registerConsumer (via Deferred) 4: FTP.registerConsumer() sends it a registerProducer() (synchronous) it returns the self._onConnLost Deferred 5: data is sent over the DTP connection, written to the consumer 6: the DTP connection is closed, _onConnLost is fired 7: DTP._unregConsumer sends unregisterProducer(), and ignores the result 8: FTP.cbConsumer()'s chain regains control, runs cbSent (via Deferred) 9: cbSent returns 226 Transfer Complete 10: the Deferred chain finally unwinds, firing the Deferred that was returned from ftp_STOR, which sends the 226 back to the client Most of these steps use Deferreds, so the opening of the file can take as long as we need, but since DTP._unregConsumer ignores the result of consumer.unregisterProducer in step #7, there's no way for the backend to stall the delivery of the 226 until the file has actually been uploaded. We need something to address this, because FTP clients are correctly assuming that once they get the 226 Transfer Complete, the file will be in place. I'm not sure how to best fix this. It seems like something needs to interact with either the IConsumer or the IWriteFile to ask "are you done uploading yet". If it wouldn't be so horribly wrong I'd suggest redefining IConsumer's unregisterProducer() to return a Deferred, and then modify the FTP client (in DTP._unregConsumer) to pass that Deferred back to the chain, something like: --- twisted/protocols/ftp.py (revision 24956) +++ twisted/protocols/ftp.py (working copy) @@ -451,10 +451,10 @@ self._buffer.append(bytes) def _unregConsumer(self, ignored): - self._cons.unregisterProducer() + d = defer.maybeDeferred(self._cons.unregisterProducer) self._cons = None del self._onConnLost - return ignored + return d But, as far as I can tell, IConsumers aren't meant to work this way: detaching a producer doesn't really mean that there won't be more data at some point in the future, just that this particular producer is no longer useful. IFinishableConsumer, though, has a 'finish' method, which is actually used in a couple of other places in twisted/protocols/ftp.py . So Plan B would be: * declare that IWriteFile.receive() must return an IFinishableConsumer instead of merely an IConsumer * document that IFinishableConsumer.finish() will be called after unregisterProducer, and document what happens in the case of exceptions * redefine IFinishableConsumer.finish() to be allowed to return a Deferred * modify FTP.ftp_STOR.cbConsumer to wait for this Deferred before sending the 226 response, something like: --- twisted/protocols/ftp.py (revision 24956) +++ twisted/protocols/ftp.py (working copy) @@ -1049,6 +1049,7 @@ cons = ASCIIConsumerWrapper(cons) d = self.dtpInstance.registerConsumer(cons) + d.addCallback(lambda res: cons.finish()) d.addCallbacks(cbSent, ebSent) # Tell them what to doooo But it doesn't look like IFinishableConsumer.finish() is meant to be used this way either: the docstring has no mention of return value, or if/when it gets called, and the FTP client is the only code in all of Twisted that uses it (and all that does is a synchronous transport.loseConnection). I guess Plan C would be to involve the IWriteFile instance, adding a finish() method of some sort, making this an FTP-specific solution. So, any thoughts? Something like this just at the VFS level would allow backends to use asynchronous-writes, which feels like a significant gap in the current functionality. It might be better to address it at the IConsumer level, but that would touch more (and better-established) code. cheers, -Brian [1]: http://allmydata.org/

On Fri, 3 Oct 2008 18:26:18 -0700, Brian Warner <warner@lothar.com> wrote:
Yep. You're right that the FTP server protocol implementation currently assumes that once it has written the last chunk to the consumer that the file is successfully uploaded. It would be a good enhancement to server to allow application code to insert one more Deferred into processing of uploads after the last write and before the success notification. Unfortunately, there's no way to signal this with `IConsumer`. Likewise (as you pointed out) `IFinishableConsumer.finish` doesn't have very well defined semantics. In particular, there's no way to infer from its docs that a Deferred might be returned.
Trying to solve this in VFS before it has been solved for any specific use case would be premature. Similarly, making a change to `IConsumer` before implementing the idea even once should be avoided. I would suggest adding a new (optional) interface to the FTP server implementation which supports the close hook that's needed here. Once something is working, then we can try to generalize it. Jean-Paul

On Mon, 6 Oct 2008 16:40:36 -0400 Jean-Paul Calderone <exarkun@divmod.com> wrote:
Ok, I've created #3462 and attached a patch for your consideration: http://twistedmatrix.com/trac/ticket/3462 With this patch, I'm able to make Tahoe do what I need it to do. thanks, -Brian

On Fri, 3 Oct 2008 18:26:18 -0700, Brian Warner <warner@lothar.com> wrote:
Yep. You're right that the FTP server protocol implementation currently assumes that once it has written the last chunk to the consumer that the file is successfully uploaded. It would be a good enhancement to server to allow application code to insert one more Deferred into processing of uploads after the last write and before the success notification. Unfortunately, there's no way to signal this with `IConsumer`. Likewise (as you pointed out) `IFinishableConsumer.finish` doesn't have very well defined semantics. In particular, there's no way to infer from its docs that a Deferred might be returned.
Trying to solve this in VFS before it has been solved for any specific use case would be premature. Similarly, making a change to `IConsumer` before implementing the idea even once should be avoided. I would suggest adding a new (optional) interface to the FTP server implementation which supports the close hook that's needed here. Once something is working, then we can try to generalize it. Jean-Paul

On Mon, 6 Oct 2008 16:40:36 -0400 Jean-Paul Calderone <exarkun@divmod.com> wrote:
Ok, I've created #3462 and attached a patch for your consideration: http://twistedmatrix.com/trac/ticket/3462 With this patch, I'm able to make Tahoe do what I need it to do. thanks, -Brian
participants (2)
-
Brian Warner
-
Jean-Paul Calderone