[Twisted-Python] Bloody Twisted VFS
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
Hey guys, I'm really keen to get twisted.vfs to a stable point that is acceptably releasable. There's 6 tickets to knock off to get it to this state: This ticket will give an easily runnable vfs server, which'll be useful for experimenting with vfs as it is worked on: * http://twistedmatrix.com/trac/ticket/2821 This ticket ports the backend vfs interface to be firstly async, and also, not quite so lame: * http://twistedmatrix.com/trac/ticket/2815 Then we can port the 3 currently supported protocols over to use the new backend interface: * http://twistedmatrix.com/trac/ticket/2816 * http://twistedmatrix.com/trac/ticket/2817 * http://twistedmatrix.com/trac/ticket/2818 And finally once everything is ported, there can be a good clean up: * http://twistedmatrix.com/trac/ticket/2819 Exarkun also has some good suggestions for #2815. I've gone ahead and fixed the ones mentioned - but this is defining an interface - which isn't something to lock into lightly. As he says: 'What these interfaces should actually say is still an open question, I think. This would probably benefit from some real-time conversation with various interested parties involved. You've probably already noticed IFTPShell which is one take on this interface (it's not exactly the same shape, but many of the ideas apply here).' is very reasonable. Just before the first .au sprint I put a bit of time into this - and conch's IConchUser interface seemed to be a better base for vfs than IFTPShell - but this is something that could use more discussion. I've quit my job in LA that's sucked up all my life for the last 2 or so years and I'm now on a road trip around the US. I've hauled up in vermont for the week to catch up on a bunch of stuff - top of the list is to try and get vfs up to scratch finally. I'm planning to be in the Massachusetts area next week - again for around a week. If anyone is available to meet up and hash this out, that'd be awesome. It feels like if we can hit #2815 on the head, the rest will fall into place. Therve has been awesome at chugging through previous vfs stuff - porting protocols to use the agreed upon backend interface should be straight forward. cheers, Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 10:26 pm, andy@thecablelounge.com wrote:
YES. PLEASE. I don't know the rest of the Cambridge Cabal's level of interest but I'd definitely like to sort this stuff out. (I promise to take minutes and file tickets for those of you not present. And we should have an IRC conference as well.) Good to see you again, Andy. "Welcome back to the stage of history."
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sat, Jun 28, 2008 at 6:58 PM, Jonathan Lange <jml@mumak.net> wrote:
That'd be great. We've stopped using twisted.vfs at work, but I'm still keen to see this stuff improve.
Oh man, I think that means *noone* uses twisted.vfs. What are you guys doing for sftp access to launchpad? Andy.
![](https://secure.gravatar.com/avatar/1327ce755b24b956995d68accae3eab2.jpg?s=120&d=mm&r=g)
On Sun, Jun 29, 2008 at 9:09 AM, Andy Gayton <andy@thecablelounge.com> wrote:
Direct implementations of ISFTPServer and ISFTPFile. Since hacking on that, I've become too familiar with the creepy world of SFTP protocol drafts. Would love to talk about it with y'all. If you can arrange to chat at a time during my weekend, I'll adjust my sleep patterns accordingly. jml
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sat, Jun 28, 2008 at 7:19 PM, Jonathan Lange <jml@mumak.net> wrote:
I've got a place in Vermont until next Thursday. Then its July, 4th, which is likely to be rowdy. We could go for a face to face + IRC meeting next weekend? We could also virtually bash around ideas before then - and I'd have the week to code up whatever is discussed. In the mean time, here's a list of interfaces to draw on: Some thoughts on async file io for twisted have been documented here: http://twistedmatrix.com/trac/wiki/Specification/AsynchronousFileInputOutput Ftp's IFTPShell, IReadFile and IReadFile http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... Conch's ISFTPServer and ISFTPFile http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... The heavily thought out, particularly from a security point of view, but synchronous, FilePath. http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... The currently propsed vfs interface, IVFSNode http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... There's already an implementation of IVFSNode - FilePathNode, which takes a FilePath (or ZipPath), and would then make it available over all protocols which have adapters for IVFSNode. cheers, Andy.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sat, Jun 28, 2008 at 6:33 PM, <glyph@divmod.com> wrote:
Exarkun is off for 10 days from Thursday arvo. What do you think of a pitted cage death match Wednesday (July, 2nd) evening? No one leaves until only one person lives, or there's consensus on an interface :) Jml, I'm guessing this'd be your Thursday morning. If this is ok - what's the best place to meet? cheers, Andy.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Mon, Jun 30, 2008 at 9:06 PM, Jonathan Lange <jml@mumak.net> wrote:
Yeah, that would be a good time for me, as long at it's after 2200 UTC.
Hey Jono, sorry again that we didn't get the conversation online into irc. Here's a quick recap of what came out of the discussions: * The immediate goal is to settle on just enough of the core interface to be certain it can be expanded in the directions which will be needed in backwards compatible ways, so we can do a release. * open should explicit state the flags and permission it takes. It shouldn't use posix constants. At first we'll just support a small subset that makes sense for all platforms. * createFile should be subsumed by open - the exclusive flag should just be apart of open's flags. * open should return a separate IO object, instead of self handling IO directly * The primitive interface for IO should be producer/consumers, replacing readChunk, writeChunk. This interface is primitive enough to express all other interfaces, while still providing the opportunity to optimize streaming performance. The producer/consumer interface will need to take an offset to allow readChunk and writeChunk to be implemented. * we're still postponing handling of symlinks * we're still using getMetadata and setMetadata - its likely we want a layer on top of using arbitrary key/value dicts for metadata, but this can be introduced in a backwards compatible way. * we still need to decide whether path resolution should be moved to a separate interface, instead of being part of the node's interface. * there's concern over the package name. twisted.tree has considerable support :) I'll try and make these changes in the next week or so. If you are interested in shaping how this goes, you can track what's going on in http://twistedmatrix.com/trac/ticket/2815 - just weigh in once the ticket goes back to review. cheers, Andy.
![](https://secure.gravatar.com/avatar/1327ce755b24b956995d68accae3eab2.jpg?s=120&d=mm&r=g)
On Sat, Jul 5, 2008 at 3:27 AM, Andy Gayton <andy@thecablelounge.com> wrote:
That's ok. Thanks for bringing it to the mailing list :)
Good idea.
+1
* createFile should be subsumed by open - the exclusive flag should just be apart of open's flags.
Having a convenience method for creating a file seems like a good idea.
It would be nice to have things so that readChunks and writeChunks (plural) could be implemented, to avoid potato programming.
This reminds me, it would be good for VFS to have an exception for "this operation isn't supported" (say with symlinks on fat32) and another exception for "supportable, but not actually implemented yet".
* we still need to decide whether path resolution should be moved to a separate interface, instead of being part of the node's interface.
I'm not 100% sure what this means? Does this relate to possibly combining with FilePath?
* there's concern over the package name. twisted.tree has considerable support :)
I kind of like that. I'm not sure what the concern is with 'vfs' though.
Here's some random stuff that I wanted to at least mention: - Error translation. This should translate the exception types, but it should also translate values, so the error contains the virtual path. - Deferreds. You don't mention them at all, but the lack of asynchronous interfaces was one of the biggest problems we had with twisted.vfs. - URL Escaping. I got bitten by this recently. It's obviously not a general VFS problem, but it's an issue with enough of them that it should be considered when defining interfaces. - "Decorators" like "read-only" and "chroot" could prove useful. Is there room in the design for such things? jml
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 07:32 am, jml@mumak.net wrote:
I don't think this is actually going to be a practical consideration, if I correctly understand what you mean. For one thing, the producer/consumer interface is going to be something (very vaguely) like this: remoteFile.writeFrom(producer[, offset, length]) remoteFile.readInto(consumer[, offset, length]) This means that if you've got a really giant file, the implementation could pretty trivially optimize delivering it to you in the most efficient possible way, keeping all the relevant buffers full at every opportunity. Given that stream-based I/O is somewhat inherently serial, it's difficult to get less potato-y than that. writeChunks, if I understand it, would be pretty trivially implementable by saying remoteFile.writeFrom(MultiChunkProducer(chunks)) Mapping 'readChunks' and 'writeChunks' to readv and writev in my head, I'm not really sure what a 'readChunks' would actually do, since we copy memory every time we sneeze in Python anyway. We're not going to have preallocated buffers to read into.
Hmm. I don't remember agreeing to layering anything on top of "arbitrary key/value dicts"; I'd really like to see a completely different layer that specifically separates out optional features (xattrs, symlinks, posix ACLs(?)) into separate interfaces with specific methods that don't necessarily need to retrieve all the metadata at once, which is sort of an inherent property of having a key/value dict. I'm OK with "still using getMetadata and setMetadata", though, since as you say, it can be introduced in a backwards-compatible way. I do think that we should keep that discussion open (for later, after the rest of this work has been completed).
I don't think it's useful to distinguish between these two types of exception at *runtime*. The use-case I can see for distinguishing is letting a programmer know that they should figure out something that might be tricky to implement and write some wrappers or submit some patches. Perhaps a separate error message, rather than a separate exception type? Do you have a different use-case? One related thing that we spoke about in person was pushing this negotiation of file-system features backwards to the initialization step, so that applications which needed unusual filesystem attributes could fail quickly with a clear error message if they weren't supported by the underlying platform. ("WebDAV requires extended filesystem attributes, and your backend, SFTP, does not provide that feature.", "txGnuStow requires symbolic links, and your backend, the Microsoft Windows filesystem, does not provide that feature.") The nice thing about this is that the default interface to the backend would be the one that masked everything but the most common subset of filesystem features, so that you couldn't *accidentally* depend on a feature that wasn't present everywhere, without specifically requesting it. In order to get more obscure features you'd have to specify a longer list of interfaces.
* we still need to decide whether path resolution should be moved to a separate interface, instead of being part of the node's interface.
I'm not 100% sure what this means? Does this relate to possibly combining with FilePath?
The tongue-in-cheek name that radix gave to this interface was 'filepath.pathdelta'. It's related to filepath in the sense that FilePath, ZipPath, et. al. could benefit from using the same interface to talk about relative pathnames rather than manipulating lists of strings. One can, after all, abstractly do operations like "child()" and "parent()" without knowing a lot about the base implementation of the filesystem in question.
* there's concern over the package name. twisted.tree has considerable support :)
I kind of like that. I'm not sure what the concern is with 'vfs' though.
"twisted.vfs" sounds incredibly boring and unpronounceable. It would be the first twisted.<acronym> package, and it's not really related to any other technology ambiguously named "vfs". However, this reminds me about another concern which I did not remember to raise while Andy was here. Should this really be twisted.<anything> at all? I'd like twisted <x> "dot products" to generally be an application which does something <x>-ish. I'm aware that not every package follows this rule, but the ones that don't are either (A) unmaintained and slated for removal, or (B) part of the core, not independent subprojects, as "vfs" seems slated to be. Put a different way: what should 'twistd tree' do? My suggestion would be a simple multi-protocol file server: HTTP, FTP (although probably disable that by default), SFTP, maybe a "native" protocol for providing a generalized backend for any Twisted application that uses the 'tree' API, so that we can write a proxy that exposes every arbitrary combination of features from the protocols it's talking to. If everyone agrees with this, then great. However, if we never intend for this to go beyond providing an API that other systems hook into, maybe it should go somewhere subordinate to another project; twisted.internet.files perhaps? To be clear: I don't mind doing a release that does not include this tool; I don't think anything should block on it. I just want it to be in the cards eventually if this is the way we're going to release it.
This sounds like a specific enough thing that you could file a ticket that described the exact behavior that you wanted. It doesn't sound contentious at all to me, so unless you think there's some hidden confusion there... go ahead?
I believe that the consensus on asynchronicity is that all of the synchronous stuff should be FilePath's job. In the glorious future of twisted.tree, everything will be async. As discussed above, this doesn't always mean Deferreds, it also means producers and consumers. One thing we didn't talk about in person: handling extremely large directories. We had spoken about children() returning a Deferred of a list; I think it would be nice if it actually had a producer/consumer API of its own. Maybe this is too much of a corner case to worry about in average applications (i.e. we could provide a give-me-a-deferred convenience API) but it would be nice if it were *possible* to implement things that were efficient against really big networked directories.
I *think* that this should be pretty easily dealt with in a pretty generic way by having a clearly-defined set of string escaping rules depending on which protocol you're using. It's a general VFS issue in the sense that there are escaping issues with "/" on regular filesystems, after all. Or at least, there are error-reporting issues with characters like "/", ";", and ":" on certain FSes.
- "Decorators" like "read-only" and "chroot" could prove useful. Is there room in the design for such things?
We did discuss having things like this. Specifically we talked a lot during the metadata discussion about the possibility for 'decorators' like "provide-xattrs-with-dotfiles" and "provide-atime-by-pretending- its-zero". However, we didn't spend too long on it because every alternative that got brought up sounded like it was a pretty amenable to a simple delegation approach; there just wasn't a lot of meat there. We'll have to check to make sure that is true in the review process, of course, but this is probably the thing I'm least worried about :).
![](https://secure.gravatar.com/avatar/1327ce755b24b956995d68accae3eab2.jpg?s=120&d=mm&r=g)
On Mon, Jul 7, 2008 at 1:12 AM, <glyph@divmod.com> wrote:
Well, one theoretical advantage is that it can avoid roundtrips in cases where the remote file server supports a readv-style operation. I can't think of any servers that do this at the moment (maybe the bzr smart server? does http 1.1 allow this?), so maybe it's not an issue.
The first kind should skip tests, the second kind should fail tests.
This sounds like a good idea, provided that there are still clear runtime errors and that you can skip the negotiation. Use cases for this would be a virtual filesystem that's glued together from other virtual filesystems, each of which has different capabilities.
("WebDAV requires extended filesystem attributes, and your backend, SFTP, does not provide that feature.", "txGnuStow requires symbolic
Actually, some versions of SFTP do provide it. I'm not sure that there are any implementations though :)
Well, the world needs a decent one of these.
So, this is the thing that *I'm* least worried about. I think it should just be an API, and that it should be done so that other Twisted components can depend on it. Beyond that, it's package location is unimportant.
Good good.
Yes. This would be very nice.
Good. I just wanted to flag it.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 02:04 am, andrew-twisted@puzzling.org wrote:
OK, I think I can see what you mean. I believe you'll be able to effectively implement "readChunks" by simply pipelining calls to readInto(consumer); different consumers can be read into in parallel.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sun, Jul 6, 2008 at 11:12 AM, <glyph@divmod.com> wrote:
...
I really like these two ideas. I've noted them down work on. I think we agree though that we could get a release out first, and then look to add these?
I remember a chat I was having with you back in PyCon 2007, about expanding twistd's command line interface to be able to multiplex together several different services. Having a collection of backend implementations available to use with this would be cool. But yeah, this could still be be provided without using a twisted.<x> package. Here's the vfs plugin so far: http://twistedmatrix.com/trac/browser/branches/vfs-twistd-plugin-2821-2/twis... -- Options.longdesc talks about the above idea. I'm ok either way on this. cheers, Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 03:02 am, andy@thecablelounge.com wrote:
One thing we didn't talk about in person: handling extremely large directories.
Absolutely. I don't think any ideas I brought up in this email should be blockers.
what should 'twistd tree' do?
Here's the vfs plugin so far:
Awesome, I guess my concern was pre-addressed!
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sun, Jul 6, 2008 at 3:32 AM, Jonathan Lange <jml@mumak.net> wrote:
It's pretty much Deferreds all the way: http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist...
- "Decorators" like "read-only" and "chroot" could prove useful. Is there room in the design for such things?
Yeah, I'm all for flavouring backends with decorators. That's why I keep trying to sneak this guy through ;) http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... You get chroot for free with the current FilePath based vfs implementation. In the previous version (trunk) of vfs there are force user/force group and umask decorators. I'd like to recreate those once we get the new version merged. Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 03:12 am, andy@thecablelounge.com wrote:
On Sun, Jul 6, 2008 at 3:32 AM, Jonathan Lange <jml@mumak.net> wrote:
Eehhhhh... I'm glad that name starts with "_" :). It seems like one of those too-clever-by-half solutions to a relatively simple problem, where the cleverness will bite you later. Something like... I would do. The general concept of decorators is really good though.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Tue, Jul 8, 2008 at 4:32 PM, <glyph@divmod.com> wrote:
Its a general solution for re-decorating the return result of methods on decorated objects, which return new instances of themselves. Which is particularly handy for decorating tree node like objects, which can return new child instances. For example: root = FilePathNode(FilePath('/tmp')) root = ReadOnly(root) root = ForceUser(root, 'nobody') root.child('foo') # should return a read only node for /tmp/foo, which creates new files as user nobody. It'd be awesome to provide support for this in a simpler way. cheers, Andy.
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
Andy Gayton wrote: [...]
For what it's worth, bzrlib.transport.decorator provides a similar facility for bzrlib.transport. It's used to implement e.g. ReadonlyTransportDecorator. There's also a ChrootTransport, which is essentially a decorator too (although it doesn't use bzrlib.transport.decorator because the generic decorator facility didn't provide a sane way to track what the root of the chroot should be). We also have lots of decorators just for testing, e.g. FakeNFSTransportDecorator, UnlistableTransportDecorator, FakeVFATTransportDecorator, etc. -Andrew.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Wed, Jul 9, 2008 at 7:34 PM, Andrew Bennetts <andrew-twisted@puzzling.org> wrote:
I'm likely over looking a few things. Just hacked this together after reading over TransportDecorator for a couple of minutes. It's possible that TransportDecorator could be implemented with vfs's Decorator helper, with something along the lines of: class TransportDecorator(twisted.vfs._decorator.Decorator): def __init__(self, url, _decorated=None): prefix = self._get_url_prefix() if not url.startswith(prefix): raise ValueError( "url %r doesn't start with decorator prefix %r" % \ (url, prefix)) decorated_url = url[len(prefix):] if _decorated is None: _decorated = get_transport(decorated_url) super(TransportDecorator, self).__init__( _decorated, factoryMethods=['clone']) def abspath(self, relpath): return self._get_url_prefix() + self.target.abspath(relpath) def external_url(self): return self._get_url_prefix() + self.target.external_url() def _get_url_prefix(self): raise NotImplementedError(self._get_url_prefix) A key difference to this compared to the verbose approach in bzrlib, is that the above object isn't an instance of Transport. So far I've handled this with zope.interface's. The above decorator could say that it implements(ITransport).
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 10:26 pm, andy@thecablelounge.com wrote:
YES. PLEASE. I don't know the rest of the Cambridge Cabal's level of interest but I'd definitely like to sort this stuff out. (I promise to take minutes and file tickets for those of you not present. And we should have an IRC conference as well.) Good to see you again, Andy. "Welcome back to the stage of history."
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sat, Jun 28, 2008 at 6:58 PM, Jonathan Lange <jml@mumak.net> wrote:
That'd be great. We've stopped using twisted.vfs at work, but I'm still keen to see this stuff improve.
Oh man, I think that means *noone* uses twisted.vfs. What are you guys doing for sftp access to launchpad? Andy.
![](https://secure.gravatar.com/avatar/1327ce755b24b956995d68accae3eab2.jpg?s=120&d=mm&r=g)
On Sun, Jun 29, 2008 at 9:09 AM, Andy Gayton <andy@thecablelounge.com> wrote:
Direct implementations of ISFTPServer and ISFTPFile. Since hacking on that, I've become too familiar with the creepy world of SFTP protocol drafts. Would love to talk about it with y'all. If you can arrange to chat at a time during my weekend, I'll adjust my sleep patterns accordingly. jml
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sat, Jun 28, 2008 at 7:19 PM, Jonathan Lange <jml@mumak.net> wrote:
I've got a place in Vermont until next Thursday. Then its July, 4th, which is likely to be rowdy. We could go for a face to face + IRC meeting next weekend? We could also virtually bash around ideas before then - and I'd have the week to code up whatever is discussed. In the mean time, here's a list of interfaces to draw on: Some thoughts on async file io for twisted have been documented here: http://twistedmatrix.com/trac/wiki/Specification/AsynchronousFileInputOutput Ftp's IFTPShell, IReadFile and IReadFile http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... Conch's ISFTPServer and ISFTPFile http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... The heavily thought out, particularly from a security point of view, but synchronous, FilePath. http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... The currently propsed vfs interface, IVFSNode http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... There's already an implementation of IVFSNode - FilePathNode, which takes a FilePath (or ZipPath), and would then make it available over all protocols which have adapters for IVFSNode. cheers, Andy.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sat, Jun 28, 2008 at 6:33 PM, <glyph@divmod.com> wrote:
Exarkun is off for 10 days from Thursday arvo. What do you think of a pitted cage death match Wednesday (July, 2nd) evening? No one leaves until only one person lives, or there's consensus on an interface :) Jml, I'm guessing this'd be your Thursday morning. If this is ok - what's the best place to meet? cheers, Andy.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Mon, Jun 30, 2008 at 9:06 PM, Jonathan Lange <jml@mumak.net> wrote:
Yeah, that would be a good time for me, as long at it's after 2200 UTC.
Hey Jono, sorry again that we didn't get the conversation online into irc. Here's a quick recap of what came out of the discussions: * The immediate goal is to settle on just enough of the core interface to be certain it can be expanded in the directions which will be needed in backwards compatible ways, so we can do a release. * open should explicit state the flags and permission it takes. It shouldn't use posix constants. At first we'll just support a small subset that makes sense for all platforms. * createFile should be subsumed by open - the exclusive flag should just be apart of open's flags. * open should return a separate IO object, instead of self handling IO directly * The primitive interface for IO should be producer/consumers, replacing readChunk, writeChunk. This interface is primitive enough to express all other interfaces, while still providing the opportunity to optimize streaming performance. The producer/consumer interface will need to take an offset to allow readChunk and writeChunk to be implemented. * we're still postponing handling of symlinks * we're still using getMetadata and setMetadata - its likely we want a layer on top of using arbitrary key/value dicts for metadata, but this can be introduced in a backwards compatible way. * we still need to decide whether path resolution should be moved to a separate interface, instead of being part of the node's interface. * there's concern over the package name. twisted.tree has considerable support :) I'll try and make these changes in the next week or so. If you are interested in shaping how this goes, you can track what's going on in http://twistedmatrix.com/trac/ticket/2815 - just weigh in once the ticket goes back to review. cheers, Andy.
![](https://secure.gravatar.com/avatar/1327ce755b24b956995d68accae3eab2.jpg?s=120&d=mm&r=g)
On Sat, Jul 5, 2008 at 3:27 AM, Andy Gayton <andy@thecablelounge.com> wrote:
That's ok. Thanks for bringing it to the mailing list :)
Good idea.
+1
* createFile should be subsumed by open - the exclusive flag should just be apart of open's flags.
Having a convenience method for creating a file seems like a good idea.
It would be nice to have things so that readChunks and writeChunks (plural) could be implemented, to avoid potato programming.
This reminds me, it would be good for VFS to have an exception for "this operation isn't supported" (say with symlinks on fat32) and another exception for "supportable, but not actually implemented yet".
* we still need to decide whether path resolution should be moved to a separate interface, instead of being part of the node's interface.
I'm not 100% sure what this means? Does this relate to possibly combining with FilePath?
* there's concern over the package name. twisted.tree has considerable support :)
I kind of like that. I'm not sure what the concern is with 'vfs' though.
Here's some random stuff that I wanted to at least mention: - Error translation. This should translate the exception types, but it should also translate values, so the error contains the virtual path. - Deferreds. You don't mention them at all, but the lack of asynchronous interfaces was one of the biggest problems we had with twisted.vfs. - URL Escaping. I got bitten by this recently. It's obviously not a general VFS problem, but it's an issue with enough of them that it should be considered when defining interfaces. - "Decorators" like "read-only" and "chroot" could prove useful. Is there room in the design for such things? jml
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 07:32 am, jml@mumak.net wrote:
I don't think this is actually going to be a practical consideration, if I correctly understand what you mean. For one thing, the producer/consumer interface is going to be something (very vaguely) like this: remoteFile.writeFrom(producer[, offset, length]) remoteFile.readInto(consumer[, offset, length]) This means that if you've got a really giant file, the implementation could pretty trivially optimize delivering it to you in the most efficient possible way, keeping all the relevant buffers full at every opportunity. Given that stream-based I/O is somewhat inherently serial, it's difficult to get less potato-y than that. writeChunks, if I understand it, would be pretty trivially implementable by saying remoteFile.writeFrom(MultiChunkProducer(chunks)) Mapping 'readChunks' and 'writeChunks' to readv and writev in my head, I'm not really sure what a 'readChunks' would actually do, since we copy memory every time we sneeze in Python anyway. We're not going to have preallocated buffers to read into.
Hmm. I don't remember agreeing to layering anything on top of "arbitrary key/value dicts"; I'd really like to see a completely different layer that specifically separates out optional features (xattrs, symlinks, posix ACLs(?)) into separate interfaces with specific methods that don't necessarily need to retrieve all the metadata at once, which is sort of an inherent property of having a key/value dict. I'm OK with "still using getMetadata and setMetadata", though, since as you say, it can be introduced in a backwards-compatible way. I do think that we should keep that discussion open (for later, after the rest of this work has been completed).
I don't think it's useful to distinguish between these two types of exception at *runtime*. The use-case I can see for distinguishing is letting a programmer know that they should figure out something that might be tricky to implement and write some wrappers or submit some patches. Perhaps a separate error message, rather than a separate exception type? Do you have a different use-case? One related thing that we spoke about in person was pushing this negotiation of file-system features backwards to the initialization step, so that applications which needed unusual filesystem attributes could fail quickly with a clear error message if they weren't supported by the underlying platform. ("WebDAV requires extended filesystem attributes, and your backend, SFTP, does not provide that feature.", "txGnuStow requires symbolic links, and your backend, the Microsoft Windows filesystem, does not provide that feature.") The nice thing about this is that the default interface to the backend would be the one that masked everything but the most common subset of filesystem features, so that you couldn't *accidentally* depend on a feature that wasn't present everywhere, without specifically requesting it. In order to get more obscure features you'd have to specify a longer list of interfaces.
* we still need to decide whether path resolution should be moved to a separate interface, instead of being part of the node's interface.
I'm not 100% sure what this means? Does this relate to possibly combining with FilePath?
The tongue-in-cheek name that radix gave to this interface was 'filepath.pathdelta'. It's related to filepath in the sense that FilePath, ZipPath, et. al. could benefit from using the same interface to talk about relative pathnames rather than manipulating lists of strings. One can, after all, abstractly do operations like "child()" and "parent()" without knowing a lot about the base implementation of the filesystem in question.
* there's concern over the package name. twisted.tree has considerable support :)
I kind of like that. I'm not sure what the concern is with 'vfs' though.
"twisted.vfs" sounds incredibly boring and unpronounceable. It would be the first twisted.<acronym> package, and it's not really related to any other technology ambiguously named "vfs". However, this reminds me about another concern which I did not remember to raise while Andy was here. Should this really be twisted.<anything> at all? I'd like twisted <x> "dot products" to generally be an application which does something <x>-ish. I'm aware that not every package follows this rule, but the ones that don't are either (A) unmaintained and slated for removal, or (B) part of the core, not independent subprojects, as "vfs" seems slated to be. Put a different way: what should 'twistd tree' do? My suggestion would be a simple multi-protocol file server: HTTP, FTP (although probably disable that by default), SFTP, maybe a "native" protocol for providing a generalized backend for any Twisted application that uses the 'tree' API, so that we can write a proxy that exposes every arbitrary combination of features from the protocols it's talking to. If everyone agrees with this, then great. However, if we never intend for this to go beyond providing an API that other systems hook into, maybe it should go somewhere subordinate to another project; twisted.internet.files perhaps? To be clear: I don't mind doing a release that does not include this tool; I don't think anything should block on it. I just want it to be in the cards eventually if this is the way we're going to release it.
This sounds like a specific enough thing that you could file a ticket that described the exact behavior that you wanted. It doesn't sound contentious at all to me, so unless you think there's some hidden confusion there... go ahead?
I believe that the consensus on asynchronicity is that all of the synchronous stuff should be FilePath's job. In the glorious future of twisted.tree, everything will be async. As discussed above, this doesn't always mean Deferreds, it also means producers and consumers. One thing we didn't talk about in person: handling extremely large directories. We had spoken about children() returning a Deferred of a list; I think it would be nice if it actually had a producer/consumer API of its own. Maybe this is too much of a corner case to worry about in average applications (i.e. we could provide a give-me-a-deferred convenience API) but it would be nice if it were *possible* to implement things that were efficient against really big networked directories.
I *think* that this should be pretty easily dealt with in a pretty generic way by having a clearly-defined set of string escaping rules depending on which protocol you're using. It's a general VFS issue in the sense that there are escaping issues with "/" on regular filesystems, after all. Or at least, there are error-reporting issues with characters like "/", ";", and ":" on certain FSes.
- "Decorators" like "read-only" and "chroot" could prove useful. Is there room in the design for such things?
We did discuss having things like this. Specifically we talked a lot during the metadata discussion about the possibility for 'decorators' like "provide-xattrs-with-dotfiles" and "provide-atime-by-pretending- its-zero". However, we didn't spend too long on it because every alternative that got brought up sounded like it was a pretty amenable to a simple delegation approach; there just wasn't a lot of meat there. We'll have to check to make sure that is true in the review process, of course, but this is probably the thing I'm least worried about :).
![](https://secure.gravatar.com/avatar/1327ce755b24b956995d68accae3eab2.jpg?s=120&d=mm&r=g)
On Mon, Jul 7, 2008 at 1:12 AM, <glyph@divmod.com> wrote:
Well, one theoretical advantage is that it can avoid roundtrips in cases where the remote file server supports a readv-style operation. I can't think of any servers that do this at the moment (maybe the bzr smart server? does http 1.1 allow this?), so maybe it's not an issue.
The first kind should skip tests, the second kind should fail tests.
This sounds like a good idea, provided that there are still clear runtime errors and that you can skip the negotiation. Use cases for this would be a virtual filesystem that's glued together from other virtual filesystems, each of which has different capabilities.
("WebDAV requires extended filesystem attributes, and your backend, SFTP, does not provide that feature.", "txGnuStow requires symbolic
Actually, some versions of SFTP do provide it. I'm not sure that there are any implementations though :)
Well, the world needs a decent one of these.
So, this is the thing that *I'm* least worried about. I think it should just be an API, and that it should be done so that other Twisted components can depend on it. Beyond that, it's package location is unimportant.
Good good.
Yes. This would be very nice.
Good. I just wanted to flag it.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 02:04 am, andrew-twisted@puzzling.org wrote:
OK, I think I can see what you mean. I believe you'll be able to effectively implement "readChunks" by simply pipelining calls to readInto(consumer); different consumers can be read into in parallel.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sun, Jul 6, 2008 at 11:12 AM, <glyph@divmod.com> wrote:
...
I really like these two ideas. I've noted them down work on. I think we agree though that we could get a release out first, and then look to add these?
I remember a chat I was having with you back in PyCon 2007, about expanding twistd's command line interface to be able to multiplex together several different services. Having a collection of backend implementations available to use with this would be cool. But yeah, this could still be be provided without using a twisted.<x> package. Here's the vfs plugin so far: http://twistedmatrix.com/trac/browser/branches/vfs-twistd-plugin-2821-2/twis... -- Options.longdesc talks about the above idea. I'm ok either way on this. cheers, Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 03:02 am, andy@thecablelounge.com wrote:
One thing we didn't talk about in person: handling extremely large directories.
Absolutely. I don't think any ideas I brought up in this email should be blockers.
what should 'twistd tree' do?
Here's the vfs plugin so far:
Awesome, I guess my concern was pre-addressed!
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Sun, Jul 6, 2008 at 3:32 AM, Jonathan Lange <jml@mumak.net> wrote:
It's pretty much Deferreds all the way: http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist...
- "Decorators" like "read-only" and "chroot" could prove useful. Is there room in the design for such things?
Yeah, I'm all for flavouring backends with decorators. That's why I keep trying to sneak this guy through ;) http://twistedmatrix.com/trac/browser/branches/vfs-async-backends-2815/twist... You get chroot for free with the current FilePath based vfs implementation. In the previous version (trunk) of vfs there are force user/force group and umask decorators. I'd like to recreate those once we get the new version merged. Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 03:12 am, andy@thecablelounge.com wrote:
On Sun, Jul 6, 2008 at 3:32 AM, Jonathan Lange <jml@mumak.net> wrote:
Eehhhhh... I'm glad that name starts with "_" :). It seems like one of those too-clever-by-half solutions to a relatively simple problem, where the cleverness will bite you later. Something like... I would do. The general concept of decorators is really good though.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Tue, Jul 8, 2008 at 4:32 PM, <glyph@divmod.com> wrote:
Its a general solution for re-decorating the return result of methods on decorated objects, which return new instances of themselves. Which is particularly handy for decorating tree node like objects, which can return new child instances. For example: root = FilePathNode(FilePath('/tmp')) root = ReadOnly(root) root = ForceUser(root, 'nobody') root.child('foo') # should return a read only node for /tmp/foo, which creates new files as user nobody. It'd be awesome to provide support for this in a simpler way. cheers, Andy.
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
Andy Gayton wrote: [...]
For what it's worth, bzrlib.transport.decorator provides a similar facility for bzrlib.transport. It's used to implement e.g. ReadonlyTransportDecorator. There's also a ChrootTransport, which is essentially a decorator too (although it doesn't use bzrlib.transport.decorator because the generic decorator facility didn't provide a sane way to track what the root of the chroot should be). We also have lots of decorators just for testing, e.g. FakeNFSTransportDecorator, UnlistableTransportDecorator, FakeVFATTransportDecorator, etc. -Andrew.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
On Wed, Jul 9, 2008 at 7:34 PM, Andrew Bennetts <andrew-twisted@puzzling.org> wrote:
I'm likely over looking a few things. Just hacked this together after reading over TransportDecorator for a couple of minutes. It's possible that TransportDecorator could be implemented with vfs's Decorator helper, with something along the lines of: class TransportDecorator(twisted.vfs._decorator.Decorator): def __init__(self, url, _decorated=None): prefix = self._get_url_prefix() if not url.startswith(prefix): raise ValueError( "url %r doesn't start with decorator prefix %r" % \ (url, prefix)) decorated_url = url[len(prefix):] if _decorated is None: _decorated = get_transport(decorated_url) super(TransportDecorator, self).__init__( _decorated, factoryMethods=['clone']) def abspath(self, relpath): return self._get_url_prefix() + self.target.abspath(relpath) def external_url(self): return self._get_url_prefix() + self.target.external_url() def _get_url_prefix(self): raise NotImplementedError(self._get_url_prefix) A key difference to this compared to the verbose approach in bzrlib, is that the above object isn't an instance of Transport. So far I've handled this with zope.interface's. The above decorator could say that it implements(ITransport).
participants (5)
-
Andrew Bennetts
-
Andy Gayton
-
glyph@divmod.com
-
Jean-Paul Calderone
-
Jonathan Lange