[Twisted-Python] twisted.vfs issues
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
So, starting to look through twisted.vfs, I'm finding a few things that need work. 1) I see no way of reading from or writing to a file in ivfs.IFileSystemLeaf. 2) createFile is racy -- it requires opening a file by the given name, with default permissions, then immediately closing it. In addition, it doesn't specify whether it's an error if the file already exists. 3) Looks like all operations are blocking? What about a remote vfs? I think every operation in the vfs interface ought to be non-blocking. 4) IFileSystemNode.remove doesn't say whether it's a recursive delete (on a directory), and .rename don't specify whether newName can be in a different directory, whether it replaces an existing file, or whether it works on a directory. 5) Errors are coarse-grained. Everything is a VFSError, and the only detailed information is in human-readable text, not any nice computer- readable form. 6) Need some support in the interface for extended attributes. That's it for now. James
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
James Y Knight wrote:
So, starting to look through twisted.vfs, I'm finding a few things that need work.
Hey James, Thanks for the feedback. We need it. Heaps of decisions for the vfs stuff have been put off to see what other use cases would need from the vfs. Inparticular permissions and metadata.
1) I see no way of reading from or writing to a file in ivfs.IFileSystemLeaf.
The vfs stuff is still heavily influenced by the interface that conch expects as sftp has been the main motivation for the current contributors. Reading and writing is done through writeChunk and readChunk - we've always felt this wasn't quite right though for a general backend. But after two sprints we still haven't come up with something that is better. Adding the web2.Stream adaptor seems to have glazed over the issue for protocols that read/writeChunk doesn't work for. Spiv even used streams for the vfs ftp adaptor! I've added read/writeChunk to ivfs.IFileSystemLeaf's interface.
2) createFile is racy -- it requires opening a file by the given name, with default permissions, then immediately closing it.
:), racy is good right?
In addition, it doesn't specify whether it's an error if the file already exists.
It should, I've added this to the interface.
3) Looks like all operations are blocking? What about a remote vfs? I think every operation in the vfs interface ought to be non-blocking.
The other option is the vfs interface could be maybe deferred. Most protocols are good at handling this (sftp, streams). But given how easy it is to return deferred.succeed - it's probably simpler to say always non-blocking.
4) IFileSystemNode.remove doesn't say whether it's a recursive delete (on a directory)
Hrm yeah - should it? Or should this be handled by higher level utilities (eg shutil). The current os backend uses os.rmdir, so doesn't do a recursive delete. I've updated the interface to say that it doesn't.
The method is against Node, so it works on directories. This is os.rename's spec: --- Rename the file or directory src to dst. If dst is a directory, OSError will be raised. On Unix, if dst exists and is a file, it will be removed silently if the user has permission. The operation may fail on some Unix flavors if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). On Windows, if dst already exists, OSError will be raised even if it is a file; there may be no way to implement an atomic rename when dst names an existing file. Availability: Macintosh, Unix, Windows. --- Should vfs be aiming to provide consistent behaviour for all operations across all backends? Or should some behaviour be left down to the particular backend to decide? For the moment I've updated the interface to read: Renames this node to newName. newName can be in a different directory. If the destination is an existing directory, an error will be raised.
yeah :( that needs to be fixed.
6) Need some support in the interface for extended attributes.
There's getMetadata. That let's you return arbitrary attributes. Would that cover what you're thinking? Protocol's should try to get by with as little metadata as they can. If a backend doesn't supply a bit of metadata a protocol must have, then it won't be able to be used with the protocol. Andy.
![](https://secure.gravatar.com/avatar/152986af8e990c9c8b61115f298b9cb2.jpg?s=120&d=mm&r=g)
On Thu, Sep 29, 2005 at 05:49:04AM +1000, Andy Gayton wrote:
As a know-nothing bystander with just enough knowledge about metadata, I'm curious. Is there a way to get a list of the kinds of metadata that are available? Is there a name-spacing system so different kinds of metadata can be available under different names? For example, a WebDAV share might (should) expose a 'Content-Type' attribute on every file, so code might be written that exposes the MIME type of the file in the 'Content-Type' attribute. On the other hand, a file on an NTFS file system can have arbitarily-named bytestreams associated with it. If the NTFS VFS module exposes byte-streams under their arbitary names, a program using the VFS could try to get the content type of a file and wind up with (several gigabytes of) almost anything. Another arbitary-metadata system that would be nice to support would be POSIX extended attributes, but I don't know what the name restrictions on those would be.
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Thu, 29 Sep 2005 05:49:04 +1000, Andy Gayton <andy@thecablelounge.com> wrote:
With that in mind.... ;)
twisted.vfs should not import things from or depend upon twisted.web2: * web2 is unreleased * web2's APIs are unstable * vfs is more generally applicable than web2 * web2's stream abstraction is not generally agreed upon If you like, we can talk more about how this interface should work. However, my first inclination is to say that it should use the existing producer/consumer APIs. While these are not the best APIs, they are used widely throughout Twisted, and therefore this will give the greatest usability to the resulting VFS code. While there are adapters between these APIs and web2 streams, I still recommend against web2 streams for the reasons mentioned above.
I've added read/writeChunk to ivfs.IFileSystemLeaf's interface.
I mentioned these in a separate email, so I won't repeat those points.
I assume you mean that they should always return a Deferred. In this case, I agree. maybeDeferred is intended as a convenience for application-level code. Framework-level code can avoid introducing the need for it at the application-level by simply always using Deferreds.
The semantics provided by vfs should be the same across all platforms and all backends. Since os.rename's semantics vary between platforms, this probably eliminates it from (unaided) use in an implementation. .rename() in VFS should work across filesystems, guarantee atomicity (if this is feasible - I think it is. If not, it should explicitly deny atomicity), and have well-defined edge cases (for example, whether an exception is raised because the destination exists already should be defined one way or the other, and that's how it should always work).
There needs to be a convention for the format of this metadata. Protocol implementations should not need to be familiar with the backend they are using, and different backends should provide the same metadata in the same way. It may make sense to expand the example dictionary in getMetadata's docstring, and continue expanding it as new requirements are made (perhaps getMetadata's docstring isn't the best place for this information, either). This still doesn't strike me as ideal, but it's better than nothing. Going further, I'd like to see pathutils implemented in terms of twisted.python.filepath: there's a lot of code duplication between these two modules. The code in twisted/vfs/adapters/dav.py is misplaced. Itamar posted to this list about this issue a couple weeks ago, but I'll re-iterate. Third-party package dependencies need to be considered carefully. Most importantly, dependencies *must* not be cyclic. Twisted cannot import from akadav, because akadav imports from Twisted. If akadav can be used to provide VFS functionality, then the adapters to do so belong in akadav, or in some other package: not beneath the Python package "twisted". As I mentioned above, twisted/vfs/adapters/ftp.py and stream.py shouldn't be importing from twisted.web2. Likewise, twisted/vfs/adapters/sftp.py's dependence on twisted.conch is backwards: twisted.conch should provide code which augments twisted.vfs. These are both great candidates for use of the plugin system. This also lets you take care of the nasty registration-requires-import issues, since gathering plugins will necessarily import the required modules, or if not, will provide a hook so that they can be imported at precisely the right time. Some easy things: new code in Twisted should use new-style classes; modules should have `test-case-name' declarations; zope Interface's convention is to not include "self" in method declarations; "type(x) is y" is generally wrong - osfs.py uses it in getMode() - both because isinstance() should really be used, and because type checking generally indicates some weakness in an API (why might the mode be either a string or an integer? pick one and require only that). I hope this doesn't come off as too critical :) I'm very much looking forward to the day when setting up a dav server against a purely virtual, dynamic filesystem is as easy as implementing a couple interfaces out of ivfs.py. Jp
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 28, 2005, at 8:04 PM, Jp Calderone wrote:
Twisted.vfs should not depend upon a module in twisted.web2 when twisted.vfs gets released. However, it is okay for it to depend upon that stream _code_ if it gets moved into twisted core before vfs is released. The idea all along has been to move t.w2.stream into twisted core when it is stable and useful. So I wouldn't worry about tearing it out of t.vfs quite yet. Now, my first inclination is that the current block API *is* the right primitive for a file. Also, in particular, making it use the old producer abstraction as a primitive is just asking for trouble. As the producer abstraction lets the producer send data asynchronously at any point, it becomes almost impossible to do a relatively simple operation like reading a part of a file. That is why, for web2, I had to drop it and make a new API that has the consumer request the data. I think the same reasoning applies here. Again, I think that all requests for tearing various adapters and other bits out of twisted.vfs are currently completely premature. At this point in its development, it is critical that adapters for many different systems are created, to make sure that vfs has the appropriate abstractions and APIs to handle all use cases. And given that vfs is itself heavily under development, it makes no sense to request that said adapters be adopted upstream in each other project, yet. James
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Wed, 28 Sep 2005 20:35:40 -0400, James Y Knight <foom@fuhm.net> wrote:
That mildly addresses one of four points. At the very least, the remaining three seem to remain valid.
Now, my first inclination is that the current block API *is* the right primitive for a file.
It precludes writing large amounts of data to a file simply.
The old API is not fantastic. On the other hand, it's entirely servicable. I don't understand why you think it is almost impossible to read part of a file using it. In fact, I've done just this one several occasions.
They can be removed from twisted.vfs without being removed from the Twisted repository. Or they could be left in twisted.vfs but developed in a branch. That is policy for major feature development, after all. Jp
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
Jp Calderone wrote:
On Wed, 28 Sep 2005 20:35:40 -0400, James Y Knight <foom@fuhm.net> wrote:
On Sep 28, 2005, at 8:04 PM, Jp Calderone wrote:
(in regards to read/writeChunk)
I think the main reason they've won out till now is that its incrediably easy to implement for backend implementors, and being so primitive, extremely easy to compose into higher abstractions (streams, producers/consumers or a convenience that lets you write large amounts of data to a file simply) through the use of adaptors. the ftp adaptor making using of the stream adaptor is a pretty good example of this.
I think the vfs stuff would really benefit by being prodded and exposed to as many use cases as possible. It's been doing what I need it do for around 6 months (except for dav - dav'd be awesome!:)) so it's hard to find motivation to work on it. But I appreciate where your coming from. I taken out the dav adaptor which was a failed experiment from the first sprint. I think the only controversial dependency left is web2, which I've discussed in another mail. Andy.
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 28, 2005, at 8:04 PM, Jp Calderone wrote:
rename() in VFS should work across filesystems,
Hm, that is going to be an interesting one to implement. I'm thinking in particular about what happens when you have a structure like: / -> adhoc.AdhocDirectory: tmp -> osfs.OSDirectory("/home/jknight/tmp", ...) home -> inmem.FakeDirectory(...) and I ask to move a file from /tmp/foo to /home/bar. IMO it is reasonable to say that the VFS 'rename' operation is allowed to cleanly fail, and not do the rename, forcing a higher level to do a copy/delete if it wants. This pushes the complication out of each VFS implementation to one implementation that will work across all, and furthermore can share its code with the copy implementation. This maps nicely to rename(2), as well, as a bonus.
guarantee atomicity (if this is feasible - I think it is. If not, it should explicitly deny atomicity),
It isn't feasible, when renaming across filesystems. There will certainly have to be a time at which both 'from' and 'to' exist. Additionally, it may be impossible to create a file 'to+".tmp"' (or similar) in the target directory to atomically rename to 'to' when you've finished copying, because of permissions. Another reason to restrict "rename" to be the simple rename, rather than the copy&delete-rename. James
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Wed, 28 Sep 2005 21:01:26 -0400, James Y Knight <foom@fuhm.net> wrote:
I can see ways to make this work. They involve temporary files, as you mention below. If we rule those out, it does become harder.
When are you allowed to create a file named "foo" in a directory, but not allowed to create a file named "foo.tmp"? Anyway, I think it's worth a try at least. If it turns out to not be possible, then certainly it should not be done :) I am mainly concerned that there be consistent behavior across backends. Writing code that deals with the quirks of os.rename() on Win32 is annoying, to say the least (and realisticly, just plain error prone). Jp
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
I'll have another shot at this :) Jp Calderone wrote:
* web2 is unreleased
vfs is unreleased. I think it's pretty safe to say it should stay that way at least until web2 is released.
* web2's APIs are unstable
vfs' APIs are obscenely unstable.
* vfs is more generally applicable than web2
as is web2's stream. As James pointed hopefully stream will eventually move into twisted core.
* web2's stream abstraction is not generally agreed upon
fair point. but this just means as stream's abstraction is reworked to meet general consensus - vfs will need to be rewritten to meet the changes.
It would be fairly straight forward to add an adapter from ivfs to producer/consumers if someone has a need for it. I personally was keen to use streams with vfs as, being new project that has the luxury of being unstable, I wanted to pitch at were twisted was heading. I was also hoping to add to web2's momentum - nevow on web2 still feels a long way off and its something I'd really like to see. Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On Thu, 29 Sep 2005 21:25:07 +1000, Andy Gayton <andy@thecablelounge.com> wrote:
I also object to the dependency, although I would propose the opposite solution: I don't think that "twisted.vfs" makes *any* sense as a separate project. web2 should not be a trivial HTTP server with a small resource wrapper in it. It should be a full-featured web server. Web does not mean "http" - all web-related protocols, such as WebDAV, ftp, even SOAP or Gopher, should be part of the 'web server' product part of Twisted. Going forward, I will stipulate this requirement: each independent Twisted subpackage MUST be at least 2 things: a functioning infrastructure layer that can be used by 3rd party applications, and an application which can be used standalone or with application plugins. In twisted.web's case this application is a "web server", which primarily does HTTP but can provide other request/response based protocols as well. "application plugins" for the web server are things that respond to requests for particular URLs. This requirement does imply the removal of a few existing Twisted packages. twisted.xish and twisted.flow come to mind. While I don't think that anything is actually using twisted.flow at this point, xish's useful code should move into the package that actually makes use of it (twisted.words.jabber). twisted.pair should probably just be part of eunuchs; while Twisted applications can use it, it certainly doesn't seem to stand on its own. I don't think it's important to act particularly quickly since I don't think any of these packages are seeing widespread external usage, but I definitely don't want to create more packages in this situation. While Twisted packages may not depend on external projects (NO MORE NEVOW IMPORTS ANYWHERE, PLEASE), it is reasonable to have soft internal dependencies; such as web2 including a Conch plugin (importing Conch interfaces) that provides an HTTP/Gopher/etc backend for SFTP. These dependencies can also be circular, so long as the circularity is between packages and not modules, AND the imports happen after startup; for example, it would also be reasonable for the previous scenario to co-exist with a plugin distributed with conch to enable SFTP servers to be used as web resources.
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 29, 2005, at 12:26 PM, glyph@divmod.com wrote:
I don't think that "twisted.vfs" makes *any* sense as a separate project.
Agreed. I had sort of assumed twisted.vfs was a proposed addition to twisted core. James
![](https://secure.gravatar.com/avatar/3bef09da3292c944649ffc28673df870.jpg?s=120&d=mm&r=g)
On Thu, Sep 29, 2005, glyph@divmod.com wrote:
My understanding of the original purpose of the split -- which is probably wrong -- though was to allow parts of twisted to release without massive dependencies on near unrelated release critical bugs, and further to allow users of multiple bits of Twisted to not have to upgrade them all as a lump. If we have a model where an enormous group of request-response protocols are lumped with web, we return to the following release management problems: 1. rapidly evolving code, like FTP, which seems to have some parts working now and were it an independent project should probably be having some highly alpha releases, will not be released until it is sufficiently stable to be bundled with a more used and mature protocol like HTTP, and may in turn block said releases by being highly alpha. 2. relatively unrelated code, like ftp, or the hypotheticals above, will need to go unreleased every time there is a major years-long rewrite of a large component (web2) My understanding of the release-early-release-often philosophy was to encourage early adopting users (who are also more likely to become contributors). As it is, no sub-project is releasing and core looks set for a pretty slow ongoing release cycle. I may be naive about this, but my concern is that while core has critical mass in terms of users, the protocols and applications largely don't, and won't until they are seen by potential users to be active and supported, part of which is actually releasing. Having users in turn has been known to spur development. What you're proposing sounds like a mature release policy: as in, sub-projects will be rather monolithic and will release when they're mature. Is this correct, and if so, will they release? -Mary
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
Jp Calderone wrote:
ok, http://twistedmatrix.com/bugs/issue1223
all that stuff on rename sounds great (moving between filesystems etc) .. might as well aim for it until its not possible. http://twistedmatrix.com/bugs/issue1224
http://twistedmatrix.com/bugs/issue1225
yep, http://twistedmatrix.com/bugs/issue1226
ok - will check out the plugins stuff. just haven't come across it yet.
http://twistedmatrix.com/bugs/issue1227
Not at all. The feedback is appreciated. Andy.
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 28, 2005, at 3:49 PM, Andy Gayton wrote:
There's getMetadata. That let's you return arbitrary attributes.
Would that cover what you're thinking?
No -- one problem is that extended metadata can be potentially very large (just as large as the file's normal contents). Another is that there may be potentially a large number of such attributes. So, you really don't want to return all of it with one call -- you want to explicitly ask for certain attributes. See the getxattr, listxattr, etc functions for what it the low-level functionality looks like in linux. At the moment this interface varies slightly between OSX, FreeBSD, and Linux, but they're almost the same. The OSX one adds an offset parameter to get/set, and the freebsd one adds a "namespace" parameter, to distinguish between root-only attributes and user attributes. I think it may be best to model it as a special kind of child, as it seems as if that's the way people are moving their thinking anyhow. IFileSystemNode.xattrContainer() -> IFileSystemContainer That returned object would then allow the use of the same createFile/ child/remove/rename/open/read operations as on normal children. It would have to throw errors if you try doing excessively "interesting" things, like trying to create directories in it, but I think that's probably okay. On the other hand, some people think xattrs are only for small metadata, and that "subfiles" or "named forks" are an altogether different thing. I'm not sure if that means that it's necessarily a bad idea to present both as fileish objects though. Some interface to this is necessary for properly functioning WebDAV (and smb if anyone ever wanted to implement that mess).
Support for xattrs can also be optional in a backend. Eventually, support for POSIX ACLs should also be considered. I don't really have the first clue how that should be modeled though. James
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
James Y Knight wrote:
How about getMetadata for simple data (size, content-type) and a container as you describe for potentially huge its of data? hrm, .getattrs, .setattrs, .xattrsContainer ? There definately should be a way to query what metadata a backend can provide, and to pass which particular data you are requesting for performance. Namespaces might be useful to? It'd be good to see a use case to show that they are. I'm keen to just let WebDAV's requirements drive the design here. None of our uses up till now have had a great demand on metadata. Andy.
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
James Y Knight wrote:
So, starting to look through twisted.vfs, I'm finding a few things that need work.
Hey James, Thanks for the feedback. We need it. Heaps of decisions for the vfs stuff have been put off to see what other use cases would need from the vfs. Inparticular permissions and metadata.
1) I see no way of reading from or writing to a file in ivfs.IFileSystemLeaf.
The vfs stuff is still heavily influenced by the interface that conch expects as sftp has been the main motivation for the current contributors. Reading and writing is done through writeChunk and readChunk - we've always felt this wasn't quite right though for a general backend. But after two sprints we still haven't come up with something that is better. Adding the web2.Stream adaptor seems to have glazed over the issue for protocols that read/writeChunk doesn't work for. Spiv even used streams for the vfs ftp adaptor! I've added read/writeChunk to ivfs.IFileSystemLeaf's interface.
2) createFile is racy -- it requires opening a file by the given name, with default permissions, then immediately closing it.
:), racy is good right?
In addition, it doesn't specify whether it's an error if the file already exists.
It should, I've added this to the interface.
3) Looks like all operations are blocking? What about a remote vfs? I think every operation in the vfs interface ought to be non-blocking.
The other option is the vfs interface could be maybe deferred. Most protocols are good at handling this (sftp, streams). But given how easy it is to return deferred.succeed - it's probably simpler to say always non-blocking.
4) IFileSystemNode.remove doesn't say whether it's a recursive delete (on a directory)
Hrm yeah - should it? Or should this be handled by higher level utilities (eg shutil). The current os backend uses os.rmdir, so doesn't do a recursive delete. I've updated the interface to say that it doesn't.
The method is against Node, so it works on directories. This is os.rename's spec: --- Rename the file or directory src to dst. If dst is a directory, OSError will be raised. On Unix, if dst exists and is a file, it will be removed silently if the user has permission. The operation may fail on some Unix flavors if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). On Windows, if dst already exists, OSError will be raised even if it is a file; there may be no way to implement an atomic rename when dst names an existing file. Availability: Macintosh, Unix, Windows. --- Should vfs be aiming to provide consistent behaviour for all operations across all backends? Or should some behaviour be left down to the particular backend to decide? For the moment I've updated the interface to read: Renames this node to newName. newName can be in a different directory. If the destination is an existing directory, an error will be raised.
yeah :( that needs to be fixed.
6) Need some support in the interface for extended attributes.
There's getMetadata. That let's you return arbitrary attributes. Would that cover what you're thinking? Protocol's should try to get by with as little metadata as they can. If a backend doesn't supply a bit of metadata a protocol must have, then it won't be able to be used with the protocol. Andy.
![](https://secure.gravatar.com/avatar/152986af8e990c9c8b61115f298b9cb2.jpg?s=120&d=mm&r=g)
On Thu, Sep 29, 2005 at 05:49:04AM +1000, Andy Gayton wrote:
As a know-nothing bystander with just enough knowledge about metadata, I'm curious. Is there a way to get a list of the kinds of metadata that are available? Is there a name-spacing system so different kinds of metadata can be available under different names? For example, a WebDAV share might (should) expose a 'Content-Type' attribute on every file, so code might be written that exposes the MIME type of the file in the 'Content-Type' attribute. On the other hand, a file on an NTFS file system can have arbitarily-named bytestreams associated with it. If the NTFS VFS module exposes byte-streams under their arbitary names, a program using the VFS could try to get the content type of a file and wind up with (several gigabytes of) almost anything. Another arbitary-metadata system that would be nice to support would be POSIX extended attributes, but I don't know what the name restrictions on those would be.
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Thu, 29 Sep 2005 05:49:04 +1000, Andy Gayton <andy@thecablelounge.com> wrote:
With that in mind.... ;)
twisted.vfs should not import things from or depend upon twisted.web2: * web2 is unreleased * web2's APIs are unstable * vfs is more generally applicable than web2 * web2's stream abstraction is not generally agreed upon If you like, we can talk more about how this interface should work. However, my first inclination is to say that it should use the existing producer/consumer APIs. While these are not the best APIs, they are used widely throughout Twisted, and therefore this will give the greatest usability to the resulting VFS code. While there are adapters between these APIs and web2 streams, I still recommend against web2 streams for the reasons mentioned above.
I've added read/writeChunk to ivfs.IFileSystemLeaf's interface.
I mentioned these in a separate email, so I won't repeat those points.
I assume you mean that they should always return a Deferred. In this case, I agree. maybeDeferred is intended as a convenience for application-level code. Framework-level code can avoid introducing the need for it at the application-level by simply always using Deferreds.
The semantics provided by vfs should be the same across all platforms and all backends. Since os.rename's semantics vary between platforms, this probably eliminates it from (unaided) use in an implementation. .rename() in VFS should work across filesystems, guarantee atomicity (if this is feasible - I think it is. If not, it should explicitly deny atomicity), and have well-defined edge cases (for example, whether an exception is raised because the destination exists already should be defined one way or the other, and that's how it should always work).
There needs to be a convention for the format of this metadata. Protocol implementations should not need to be familiar with the backend they are using, and different backends should provide the same metadata in the same way. It may make sense to expand the example dictionary in getMetadata's docstring, and continue expanding it as new requirements are made (perhaps getMetadata's docstring isn't the best place for this information, either). This still doesn't strike me as ideal, but it's better than nothing. Going further, I'd like to see pathutils implemented in terms of twisted.python.filepath: there's a lot of code duplication between these two modules. The code in twisted/vfs/adapters/dav.py is misplaced. Itamar posted to this list about this issue a couple weeks ago, but I'll re-iterate. Third-party package dependencies need to be considered carefully. Most importantly, dependencies *must* not be cyclic. Twisted cannot import from akadav, because akadav imports from Twisted. If akadav can be used to provide VFS functionality, then the adapters to do so belong in akadav, or in some other package: not beneath the Python package "twisted". As I mentioned above, twisted/vfs/adapters/ftp.py and stream.py shouldn't be importing from twisted.web2. Likewise, twisted/vfs/adapters/sftp.py's dependence on twisted.conch is backwards: twisted.conch should provide code which augments twisted.vfs. These are both great candidates for use of the plugin system. This also lets you take care of the nasty registration-requires-import issues, since gathering plugins will necessarily import the required modules, or if not, will provide a hook so that they can be imported at precisely the right time. Some easy things: new code in Twisted should use new-style classes; modules should have `test-case-name' declarations; zope Interface's convention is to not include "self" in method declarations; "type(x) is y" is generally wrong - osfs.py uses it in getMode() - both because isinstance() should really be used, and because type checking generally indicates some weakness in an API (why might the mode be either a string or an integer? pick one and require only that). I hope this doesn't come off as too critical :) I'm very much looking forward to the day when setting up a dav server against a purely virtual, dynamic filesystem is as easy as implementing a couple interfaces out of ivfs.py. Jp
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 28, 2005, at 8:04 PM, Jp Calderone wrote:
Twisted.vfs should not depend upon a module in twisted.web2 when twisted.vfs gets released. However, it is okay for it to depend upon that stream _code_ if it gets moved into twisted core before vfs is released. The idea all along has been to move t.w2.stream into twisted core when it is stable and useful. So I wouldn't worry about tearing it out of t.vfs quite yet. Now, my first inclination is that the current block API *is* the right primitive for a file. Also, in particular, making it use the old producer abstraction as a primitive is just asking for trouble. As the producer abstraction lets the producer send data asynchronously at any point, it becomes almost impossible to do a relatively simple operation like reading a part of a file. That is why, for web2, I had to drop it and make a new API that has the consumer request the data. I think the same reasoning applies here. Again, I think that all requests for tearing various adapters and other bits out of twisted.vfs are currently completely premature. At this point in its development, it is critical that adapters for many different systems are created, to make sure that vfs has the appropriate abstractions and APIs to handle all use cases. And given that vfs is itself heavily under development, it makes no sense to request that said adapters be adopted upstream in each other project, yet. James
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Wed, 28 Sep 2005 20:35:40 -0400, James Y Knight <foom@fuhm.net> wrote:
That mildly addresses one of four points. At the very least, the remaining three seem to remain valid.
Now, my first inclination is that the current block API *is* the right primitive for a file.
It precludes writing large amounts of data to a file simply.
The old API is not fantastic. On the other hand, it's entirely servicable. I don't understand why you think it is almost impossible to read part of a file using it. In fact, I've done just this one several occasions.
They can be removed from twisted.vfs without being removed from the Twisted repository. Or they could be left in twisted.vfs but developed in a branch. That is policy for major feature development, after all. Jp
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
Jp Calderone wrote:
On Wed, 28 Sep 2005 20:35:40 -0400, James Y Knight <foom@fuhm.net> wrote:
On Sep 28, 2005, at 8:04 PM, Jp Calderone wrote:
(in regards to read/writeChunk)
I think the main reason they've won out till now is that its incrediably easy to implement for backend implementors, and being so primitive, extremely easy to compose into higher abstractions (streams, producers/consumers or a convenience that lets you write large amounts of data to a file simply) through the use of adaptors. the ftp adaptor making using of the stream adaptor is a pretty good example of this.
I think the vfs stuff would really benefit by being prodded and exposed to as many use cases as possible. It's been doing what I need it do for around 6 months (except for dav - dav'd be awesome!:)) so it's hard to find motivation to work on it. But I appreciate where your coming from. I taken out the dav adaptor which was a failed experiment from the first sprint. I think the only controversial dependency left is web2, which I've discussed in another mail. Andy.
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 28, 2005, at 8:04 PM, Jp Calderone wrote:
rename() in VFS should work across filesystems,
Hm, that is going to be an interesting one to implement. I'm thinking in particular about what happens when you have a structure like: / -> adhoc.AdhocDirectory: tmp -> osfs.OSDirectory("/home/jknight/tmp", ...) home -> inmem.FakeDirectory(...) and I ask to move a file from /tmp/foo to /home/bar. IMO it is reasonable to say that the VFS 'rename' operation is allowed to cleanly fail, and not do the rename, forcing a higher level to do a copy/delete if it wants. This pushes the complication out of each VFS implementation to one implementation that will work across all, and furthermore can share its code with the copy implementation. This maps nicely to rename(2), as well, as a bonus.
guarantee atomicity (if this is feasible - I think it is. If not, it should explicitly deny atomicity),
It isn't feasible, when renaming across filesystems. There will certainly have to be a time at which both 'from' and 'to' exist. Additionally, it may be impossible to create a file 'to+".tmp"' (or similar) in the target directory to atomically rename to 'to' when you've finished copying, because of permissions. Another reason to restrict "rename" to be the simple rename, rather than the copy&delete-rename. James
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Wed, 28 Sep 2005 21:01:26 -0400, James Y Knight <foom@fuhm.net> wrote:
I can see ways to make this work. They involve temporary files, as you mention below. If we rule those out, it does become harder.
When are you allowed to create a file named "foo" in a directory, but not allowed to create a file named "foo.tmp"? Anyway, I think it's worth a try at least. If it turns out to not be possible, then certainly it should not be done :) I am mainly concerned that there be consistent behavior across backends. Writing code that deals with the quirks of os.rename() on Win32 is annoying, to say the least (and realisticly, just plain error prone). Jp
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
I'll have another shot at this :) Jp Calderone wrote:
* web2 is unreleased
vfs is unreleased. I think it's pretty safe to say it should stay that way at least until web2 is released.
* web2's APIs are unstable
vfs' APIs are obscenely unstable.
* vfs is more generally applicable than web2
as is web2's stream. As James pointed hopefully stream will eventually move into twisted core.
* web2's stream abstraction is not generally agreed upon
fair point. but this just means as stream's abstraction is reworked to meet general consensus - vfs will need to be rewritten to meet the changes.
It would be fairly straight forward to add an adapter from ivfs to producer/consumers if someone has a need for it. I personally was keen to use streams with vfs as, being new project that has the luxury of being unstable, I wanted to pitch at were twisted was heading. I was also hoping to add to web2's momentum - nevow on web2 still feels a long way off and its something I'd really like to see. Andy.
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On Thu, 29 Sep 2005 21:25:07 +1000, Andy Gayton <andy@thecablelounge.com> wrote:
I also object to the dependency, although I would propose the opposite solution: I don't think that "twisted.vfs" makes *any* sense as a separate project. web2 should not be a trivial HTTP server with a small resource wrapper in it. It should be a full-featured web server. Web does not mean "http" - all web-related protocols, such as WebDAV, ftp, even SOAP or Gopher, should be part of the 'web server' product part of Twisted. Going forward, I will stipulate this requirement: each independent Twisted subpackage MUST be at least 2 things: a functioning infrastructure layer that can be used by 3rd party applications, and an application which can be used standalone or with application plugins. In twisted.web's case this application is a "web server", which primarily does HTTP but can provide other request/response based protocols as well. "application plugins" for the web server are things that respond to requests for particular URLs. This requirement does imply the removal of a few existing Twisted packages. twisted.xish and twisted.flow come to mind. While I don't think that anything is actually using twisted.flow at this point, xish's useful code should move into the package that actually makes use of it (twisted.words.jabber). twisted.pair should probably just be part of eunuchs; while Twisted applications can use it, it certainly doesn't seem to stand on its own. I don't think it's important to act particularly quickly since I don't think any of these packages are seeing widespread external usage, but I definitely don't want to create more packages in this situation. While Twisted packages may not depend on external projects (NO MORE NEVOW IMPORTS ANYWHERE, PLEASE), it is reasonable to have soft internal dependencies; such as web2 including a Conch plugin (importing Conch interfaces) that provides an HTTP/Gopher/etc backend for SFTP. These dependencies can also be circular, so long as the circularity is between packages and not modules, AND the imports happen after startup; for example, it would also be reasonable for the previous scenario to co-exist with a plugin distributed with conch to enable SFTP servers to be used as web resources.
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 29, 2005, at 12:26 PM, glyph@divmod.com wrote:
I don't think that "twisted.vfs" makes *any* sense as a separate project.
Agreed. I had sort of assumed twisted.vfs was a proposed addition to twisted core. James
![](https://secure.gravatar.com/avatar/3bef09da3292c944649ffc28673df870.jpg?s=120&d=mm&r=g)
On Thu, Sep 29, 2005, glyph@divmod.com wrote:
My understanding of the original purpose of the split -- which is probably wrong -- though was to allow parts of twisted to release without massive dependencies on near unrelated release critical bugs, and further to allow users of multiple bits of Twisted to not have to upgrade them all as a lump. If we have a model where an enormous group of request-response protocols are lumped with web, we return to the following release management problems: 1. rapidly evolving code, like FTP, which seems to have some parts working now and were it an independent project should probably be having some highly alpha releases, will not be released until it is sufficiently stable to be bundled with a more used and mature protocol like HTTP, and may in turn block said releases by being highly alpha. 2. relatively unrelated code, like ftp, or the hypotheticals above, will need to go unreleased every time there is a major years-long rewrite of a large component (web2) My understanding of the release-early-release-often philosophy was to encourage early adopting users (who are also more likely to become contributors). As it is, no sub-project is releasing and core looks set for a pretty slow ongoing release cycle. I may be naive about this, but my concern is that while core has critical mass in terms of users, the protocols and applications largely don't, and won't until they are seen by potential users to be active and supported, part of which is actually releasing. Having users in turn has been known to spur development. What you're proposing sounds like a mature release policy: as in, sub-projects will be rather monolithic and will release when they're mature. Is this correct, and if so, will they release? -Mary
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
Jp Calderone wrote:
ok, http://twistedmatrix.com/bugs/issue1223
all that stuff on rename sounds great (moving between filesystems etc) .. might as well aim for it until its not possible. http://twistedmatrix.com/bugs/issue1224
http://twistedmatrix.com/bugs/issue1225
yep, http://twistedmatrix.com/bugs/issue1226
ok - will check out the plugins stuff. just haven't come across it yet.
http://twistedmatrix.com/bugs/issue1227
Not at all. The feedback is appreciated. Andy.
![](https://secure.gravatar.com/avatar/15fa47f2847592672210af8a25cd1f34.jpg?s=120&d=mm&r=g)
On Sep 28, 2005, at 3:49 PM, Andy Gayton wrote:
There's getMetadata. That let's you return arbitrary attributes.
Would that cover what you're thinking?
No -- one problem is that extended metadata can be potentially very large (just as large as the file's normal contents). Another is that there may be potentially a large number of such attributes. So, you really don't want to return all of it with one call -- you want to explicitly ask for certain attributes. See the getxattr, listxattr, etc functions for what it the low-level functionality looks like in linux. At the moment this interface varies slightly between OSX, FreeBSD, and Linux, but they're almost the same. The OSX one adds an offset parameter to get/set, and the freebsd one adds a "namespace" parameter, to distinguish between root-only attributes and user attributes. I think it may be best to model it as a special kind of child, as it seems as if that's the way people are moving their thinking anyhow. IFileSystemNode.xattrContainer() -> IFileSystemContainer That returned object would then allow the use of the same createFile/ child/remove/rename/open/read operations as on normal children. It would have to throw errors if you try doing excessively "interesting" things, like trying to create directories in it, but I think that's probably okay. On the other hand, some people think xattrs are only for small metadata, and that "subfiles" or "named forks" are an altogether different thing. I'm not sure if that means that it's necessarily a bad idea to present both as fileish objects though. Some interface to this is necessary for properly functioning WebDAV (and smb if anyone ever wanted to implement that mess).
Support for xattrs can also be optional in a backend. Eventually, support for POSIX ACLs should also be considered. I don't really have the first clue how that should be modeled though. James
![](https://secure.gravatar.com/avatar/2e9b5cb8fcf834ddf8be44a450efe97f.jpg?s=120&d=mm&r=g)
James Y Knight wrote:
How about getMetadata for simple data (size, content-type) and a container as you describe for potentially huge its of data? hrm, .getattrs, .setattrs, .xattrsContainer ? There definately should be a way to query what metadata a backend can provide, and to pass which particular data you are requesting for performance. Namespaces might be useful to? It'd be good to see a use case to show that they are. I'm keen to just let WebDAV's requirements drive the design here. None of our uses up till now have had a great demand on metadata. Andy.
participants (6)
-
Andy Gayton
-
glyph@divmod.com
-
James Y Knight
-
Jp Calderone
-
Mary Gardiner
-
Screwtape