[Twisted-Python] Twisted, medusa, ZServer, and VFS's

G'day, I've just been looking at Twisted Python for the past hour or two, read the mailing list archives, and have some comments and questions. I've been working on using Medusa for serving a virtual mirror via http and ftp. I'm at the point where I'm close, but I'm starting to re-think some stuff, in particular my choice of Medusa. First my impressions of the three contenders; Twisted, Medusa, ZServer. Please correct me if I'm wrong in the following summaries; Medusa seems to be the daddy of them all. It's the oldest, which has benefits and problems. It is a little messy from its evolution, but seems pretty mature. It uses an async select loop to drive everything. It uses "asynchat" derived class objects to communicate on sockets. Data can be sent by pushing "producers" onto asynchat objects. Producers can be complex objects that produce data, execute callbacks, whatever. It's ftp and http server classes use a primative VFS to serve from. The asyncore and asynchat modules it is built on are now part of Python. Twisted seems to be a from-the-ground up re-invention of Medusa. It's newer, but surprisingly it's bigger, dispite it's apparently less mature feature set. It is similar in structure to Medusa, but simplifys it by dispensing with producers. It can use a variety of event-loops, including Tk and GTK, or it's own. It doesen't have a VFS (yet) so its ftp and http servers serve from the underlying os filesystem. ZServer grew out of Medusa. It uses the same basic underlying architecture, but throws in threads to get around the problem of delayed producers blocking the event loop. I'm not sure how tightly tied to Zope it is, but its http and ftp servers generally serve from a ZODB database, presumably wrapped in a Medusa VFS, though I have a feeling they might have changed that. ZServer also supports webDAV serving. It is possible that some of the enhancements could be merged back into Medusa, but probably it has changed so much it would be difficult. My problem with Medusa is its http and ftp servers assume that the VFS can deliver files wrapped in producers without blocking. I've fixed this by creating a patch for Medusa's asynchat that adds support for a ready() method to producers, so they can block without blocking the event loop. I'm currently in the process of writing Medusa VFS's for ftp and eventualy http backends. In the process I've also found that the Medusa ftp server is not as full featured as I want. In my search for Python VFS's I found PyVFS (http://www.pycage.de/). This is modelled on the Gnome/MC VFS, so it supports '/dir/somefile.tar.gz#tgz:/somepath' style paths to look inside tar, tgz, ftp, whatever. The various different VFS backends are loaded dynamicly as pluggins. These pluggins execute as a seperate process that are communicated with over a channel. The API is too simplistic for me, with files being "projected" out of the VFS to a local file to be manipulated/used. I don't like the pluggin-process-channel architecture either. In the reading of the Twisted mailing list, I saw a comment to the affect that the Medusa VFS was an example of how _not_ to do it, which lead to using webDAV as the API for a VFS. My gut feeling is DAV is a cool protocol for a VFS backend, but I dunno about using it as the primary API. Sure, it supports meta- data etc, but the reality is the API that is most widely used and understood is the POSIX filesytem API, as exposed in Python by the os and os.path modules. My solution for a VFS has been, upto now, based on Medusa's, but extending it to be more like os and os.path. So far it's a filesystem class with most of the os and os.path methods. One of the derived classes is a mountable_filesystem that allows you to mount other VFS filesystems off it. At this point I'm tempted to make a vfs module that emulates os and os.path so that you can mount whatever vfs's you want first, and then just replace all your os.* calls with vfs.* calls. Note that the one catch would be open() would need to be replaced with vfs.open(). I'm sort of fishing for general suggestions, comments, and interest. I'm at the point where I've just convinced myself my vfs is worth finishing, and my ready () patch to asynchat is worth updating, but I'm not sure what to use as the http and ftp server front-end, though I'm still leaning towards medusa. It looks like Twister is not ready, and ZServer would be too hard to seperate from Zope. PS... I'm not on the zope-dev list but I am on the twister and medusa lists. The zope-dev list is too much non-ZServer stuff and that's all I'm interested in. So zope-dev'ers, please reply to me or one/both of the other lists directly. -- ABO: finger abo@minkirri.apana.org.au for more information.

On Tuesday 09 October 2001 10:48 pm, Donovan Baarda wrote:
twisted has lots of interesting ideas and architecture, but since i haven't used it i'll refrain from comment.
zserver swallows medusa whole, and leaves it pretty much untouched. it builds its functionality ontop of medusa. there is no vfs in zope's interaction with medusa. requests are handled by an installed handler and are passed off to a thread pool which calls zpublisher (zope's orb) that maps requests directly onto the zodb. zserver's thread architecure is pretty much separate from medusa. a single medusa thread handles most of the network i/o. the thread architecture of zope (imo) is indeed mainly for avoiding blocking the event loop and to allow zope and separate processing from i/o. i never used webdav so i can't comment, but my understanding is that changes to medusa made for zserver are basically things specific to zope, using zserver is basically using medusa. but using it without zope means you loose all the zope based functionality which includes the webdav implementation. <snip vfs/ftp>
my two cents, i'm not sure why you think twister isn't ready but its probably worth experimenting with. as for zserver i don't think there is any additional standalone functionality that is offered on top of medusa distribution that really makes this make sense (with the possible exception that you want to use a threaded async i/o architecture like zservers, in which case you might also want to take a look at webware's asyncthreaded server.) cheers kapil thangavelu

Actually, this brings up this idea I had - Zope should replace medusa with Twisted. Why, you ask? 1) Twisted separates transport from protocols, and the event loop it uses is extendable and generic. That means: - It can run on Jython (using threads, someday with java.nio), and it can be integrated with the Tk and GTK event loops. - Your protocol doesn't have to worry about the transport - Twisted supports SSL, TCP and unix domain sockets right now, without having to make any change to the protocols. 2) Twisted is designed to run multiple servers and protocols at the same time, and these can be changed at runtime. It already includes pure python support for HTTP, FTP, LDAP, SMTP, POP3, DNS, telnet, AIM TOC, and IRC, all integrated with the main event loop (all have server support except DNS and LDAP). Adding new protocols to Zope is not easy, at the moment. 3) Twisted is being actively developed and extended. medusa less so. 4) Good integration with threads - while event based, twisted has a very nice model for dealing with threaded apps. 5) Twisted has Perspective Broker, an async.ready remote-object protocol that supports caching, object migration, and remote messaging, with integrated authentication and authorization. And it ideologically meshes with the "object publisher" notion in Zope. No, really :) Twisted already includes a high-level web framework, but Zope probably would not use it, and instead build its own on top of twisted's low-level http support.

On Wed, 10 Oct 2001, Donovan Baarda <abo@minkirri.apana.org.au> wrote:
Medusa seems to be the daddy of them all.
Calling Medusa Twisted's daddy is rewriting history.
Twisted seems to be a from-the-ground up re-invention of Medusa.
Only as much as it is a from-the-ground up re-invention of qmail. Or Apache. Twisted is a new network framework, which takes good ideas from all around.
I think Twisted's feature set is very mature. Particularily, it does have good integration with threads.
It is similar in structure to Medusa, but simplifys it by dispensing with producers.
Well, you can still have producers -- they are just tied in to connections rather then the event loop itself.
Well, the HTTP server can serve from in-memory resources, or for that matter, any resource that follows the protocol. -- The Official Moshe Zadka FAQ: http://moshez.geek The Official Moshe Zadka FAQ For Dummies: http://moshez.org Read the FAQ

I'm only responding to the twisted-python list, since the cross-post seemed excessive. Feel free to rebroadcast this if ensuing discussion on other lists is interesting. On Wednesday, October 10, 2001, at 12:48 AM, Donovan Baarda wrote:
Thank you for your interest. Here's a twisted way of looking at your questions :-)
I don't see what you mean by that; about the only things that Medusa and Twisted share is that they are both asynchronous networking frameworks. They seem to have a fairly different approach to how protocols are written and integrated, and in what their scope for future development is.
This sounds pretty accurate to me.
Twisted seems to be a from-the-ground up re-invention of Medusa.
As Moshe said, only insofar as it's a from-the-ground-up re-invention of about 6 or 7 other things.
It's newer, but surprisingly it's bigger, dispite it's apparently less mature feature set.
What do you mean when you say "less mature"? Twisted's features have been around for less time (hence, "newer") but compare, for example, twisted.spread.pb with rpc_server. Or twisted.words with chat_server Would you characterize the medusa approach in any of these comparisons as "more mature"? What is the Medusa equivalent of twisted.reality, twisted.mail, twisted.web.widgets, or twisted.enterprise? These services are at varying levels of maturity, but surely the fact that they exist at all has to count for something :-).
It is similar in structure to Medusa,
At some extremely superficial level, I guess this is true. However, Twisted does a lot more than just clone medusa. Even at a basic level, you could say that it complicates the medusa structure a great deal with a unified notion of authentication, automatic persistence, and incidentally, several full-featured applications. :-)
but simplifys it by dispensing with producers.
Twisted has producers, but only when you need them. http://twistedmatrix.com/users/glyph/TwistedDocs/Twisted-0.11.0/twisted/inte... abstract_FileDescriptor.py.html#registerProducer
Well, it depends what you mean by VFS. Twisted has a perception of the filesystem as more like a special-case of "container" than containers as a special-case of the filesystem. There are containers which can respond to specifics of the HTTP protocol that are not derived from files; would you call that part of a "VFS"? The semantics of "__builtins__.open" are not sufficiently rich to support that.
Why not? It sounds clever to me. I don't have a good picture of your requirements (other than "HTTP and FTP" at this point, so I can't fathom why you like or don't like this particular solution. (Why is it relevant?)
Yes, but that's a blocking API; ergo, it does not work in an asynchronous framework like Twisted. Not all "files" are associated with a file descriptor, so they may potentially support different operations. Directory listing on FTP, HTTP, and WebDAV sites is not necessarily consistent with the files that are actually available. There is metadata associated with some requests, and not with others... in short, there are lots of subtle issues involved with supporting each of these types of hierarchies well, and a blanket virtual "filesystem" implementation does not satisfy all (or even a reasonably large subset) of them. For different protocols, there may be API differences, unless some of the protocols are stripped to the "lowest common denominator", e.g. POSIX.
Not necessarily. You could always hack up the __builtins__ module at runtime to point to your newer, better open(). Either way it seems like there are probably issues with security & the presence of *real* file descriptors that you have to think about...
Ready for what, is the question? :-) Short form: HTTP yes, FTP no (but it could be with a little work), everything else yes. Long form: We do not currently have (or have any high-priority plans to produce) a "VFS", but I would contend that such a system is not necessary when you look at the way that Twisted does web resources. FTP doesn't currently use that model, but I don't think it would be a difficult modification; it uses the same Producer model that HTTP does. I estimate that it would be easier to modify Twisted in this way than to undertake a project to do your own VFS, but since I'm not exactly sure what's going to make you happy, I don't have a high degree of confidence in that estimation. As far as robustness (which is an implied issue with Twisted's "maturity"), Twisted is being used to run the main twistedmatrix.com site. We always run the most recent development version. So far, it's only served 17.422 downloads of Twisted itself and approximately three quarters of a million hits, and survived five or six freshmeat "attacks" :-). It did crash, once, about 8 months ago after a major refactoring (before we had acceptance tests...), but that bug was quickly fixed. We don't run a public FTP server, but the core parts of Twisted that enable one to run a website where you can download files are very personally important to me, so draw your own conclusions :-). -- ______ you are in a maze of twisted little applications, all | |_\ remarkably consistent. | | -- glyph lefkowitz, glyph @ twisted matrix . com |_____| http://www.twistedmatrix.com/

On Wed, Oct 10, 2001 at 03:56:38AM -0500, Glyph Lefkowitz wrote:
sticking to this convention... :-)
This I guess is part of what is putting me off Twisted. I'm familiar with Medusa, and Twisted is just different enough that I've got a learning curve ahead of me. Unless I get convinced Twisted is worth the extra effort...
Python async socket framework... it's filling the same void :-)
That probably accounts for Twisted having more code. However, in my case I'm just after http and ftp server capabilities. I have a feeling Medusa's ftp code at least is more complete.
I'm also a bit of a less-is-more person... I don't really need all of that. However, if the framework is neat, and I get that without extra hassles and bloat, I guess I'd use it.
Hmmm... looking into this.
I was just looking at the ftp part, and it seems that it can only serve files. If it can to more than this, I guess I'm more interested.
It is overkill for what it is. Spawning whole extra processes and using inter-process communication over a channel when just classes and/or threads would do the job. The biggest limitation was the API... I basicly want to serve up a virtual mirror, which means I need to be able to identify and mirror things like symlinks. However, because it is quite simple, I was thinking of making a VFS backend for my VFS that talks to PyVFS... more as a proof of concept, but also to get tar.gz, cpio, mailbox, etc for free.
The Medusa VFS is a stripped down POSIX, and it actualy causes blocking problems for my application. That's why I added ready() support to asynchat, so that file producers can tell the select loop they would block and be excluded from that time round the loop. This is basicly a hack... a probably neater way is to make the file producers asychat's themselves but that would require major restructuring of Medusa's ftp and http servers.
That's why I'm going for POSIX :-)
Thats why I'd keep it seperate inside a vfs module... remember at least one vfs backend would be using the builtin open and os modules to access the real fs. Though now you mention it... it could be away of transperantly running any application on top of a VFS without changing it at all. Hmmm... bound to be clashes... I wonder...
FTP is the main one...though I'd really love rsync... roll on librsync :-)
The application is basicly a mirror daemon that serves up a virtual mirror of an ftp, rsync or http site. To the ftp and http clients, it appears to be a full mirror. Files are fetched on demand and stored in a partial mirror on the server. My plan was to use a VFS to 'mount' the remote ftp, rsync, or http site. The partial mirror on the server would also be accessed through the same VFS interface. Then I was going to overlay a "mirrorfs" VFS over them both, that would mirror the remote VFS to the local one on demand. The http/ftp server part would then just serve files from the mirrorfs VFS. The beauty of this is the different VFS backends would allow you to do wierd things like on-demand mirror a remote ftp server into a local tar.gz file, not that you'd want to :-) -- ---------------------------------------------------------------------- ABO: finger abo@minkirri.apana.org.au for more info, including pgp key ----------------------------------------------------------------------

On Tuesday 09 October 2001 10:48 pm, Donovan Baarda wrote:
twisted has lots of interesting ideas and architecture, but since i haven't used it i'll refrain from comment.
zserver swallows medusa whole, and leaves it pretty much untouched. it builds its functionality ontop of medusa. there is no vfs in zope's interaction with medusa. requests are handled by an installed handler and are passed off to a thread pool which calls zpublisher (zope's orb) that maps requests directly onto the zodb. zserver's thread architecure is pretty much separate from medusa. a single medusa thread handles most of the network i/o. the thread architecture of zope (imo) is indeed mainly for avoiding blocking the event loop and to allow zope and separate processing from i/o. i never used webdav so i can't comment, but my understanding is that changes to medusa made for zserver are basically things specific to zope, using zserver is basically using medusa. but using it without zope means you loose all the zope based functionality which includes the webdav implementation. <snip vfs/ftp>
my two cents, i'm not sure why you think twister isn't ready but its probably worth experimenting with. as for zserver i don't think there is any additional standalone functionality that is offered on top of medusa distribution that really makes this make sense (with the possible exception that you want to use a threaded async i/o architecture like zservers, in which case you might also want to take a look at webware's asyncthreaded server.) cheers kapil thangavelu

Actually, this brings up this idea I had - Zope should replace medusa with Twisted. Why, you ask? 1) Twisted separates transport from protocols, and the event loop it uses is extendable and generic. That means: - It can run on Jython (using threads, someday with java.nio), and it can be integrated with the Tk and GTK event loops. - Your protocol doesn't have to worry about the transport - Twisted supports SSL, TCP and unix domain sockets right now, without having to make any change to the protocols. 2) Twisted is designed to run multiple servers and protocols at the same time, and these can be changed at runtime. It already includes pure python support for HTTP, FTP, LDAP, SMTP, POP3, DNS, telnet, AIM TOC, and IRC, all integrated with the main event loop (all have server support except DNS and LDAP). Adding new protocols to Zope is not easy, at the moment. 3) Twisted is being actively developed and extended. medusa less so. 4) Good integration with threads - while event based, twisted has a very nice model for dealing with threaded apps. 5) Twisted has Perspective Broker, an async.ready remote-object protocol that supports caching, object migration, and remote messaging, with integrated authentication and authorization. And it ideologically meshes with the "object publisher" notion in Zope. No, really :) Twisted already includes a high-level web framework, but Zope probably would not use it, and instead build its own on top of twisted's low-level http support.

On Wed, 10 Oct 2001, Donovan Baarda <abo@minkirri.apana.org.au> wrote:
Medusa seems to be the daddy of them all.
Calling Medusa Twisted's daddy is rewriting history.
Twisted seems to be a from-the-ground up re-invention of Medusa.
Only as much as it is a from-the-ground up re-invention of qmail. Or Apache. Twisted is a new network framework, which takes good ideas from all around.
I think Twisted's feature set is very mature. Particularily, it does have good integration with threads.
It is similar in structure to Medusa, but simplifys it by dispensing with producers.
Well, you can still have producers -- they are just tied in to connections rather then the event loop itself.
Well, the HTTP server can serve from in-memory resources, or for that matter, any resource that follows the protocol. -- The Official Moshe Zadka FAQ: http://moshez.geek The Official Moshe Zadka FAQ For Dummies: http://moshez.org Read the FAQ

I'm only responding to the twisted-python list, since the cross-post seemed excessive. Feel free to rebroadcast this if ensuing discussion on other lists is interesting. On Wednesday, October 10, 2001, at 12:48 AM, Donovan Baarda wrote:
Thank you for your interest. Here's a twisted way of looking at your questions :-)
I don't see what you mean by that; about the only things that Medusa and Twisted share is that they are both asynchronous networking frameworks. They seem to have a fairly different approach to how protocols are written and integrated, and in what their scope for future development is.
This sounds pretty accurate to me.
Twisted seems to be a from-the-ground up re-invention of Medusa.
As Moshe said, only insofar as it's a from-the-ground-up re-invention of about 6 or 7 other things.
It's newer, but surprisingly it's bigger, dispite it's apparently less mature feature set.
What do you mean when you say "less mature"? Twisted's features have been around for less time (hence, "newer") but compare, for example, twisted.spread.pb with rpc_server. Or twisted.words with chat_server Would you characterize the medusa approach in any of these comparisons as "more mature"? What is the Medusa equivalent of twisted.reality, twisted.mail, twisted.web.widgets, or twisted.enterprise? These services are at varying levels of maturity, but surely the fact that they exist at all has to count for something :-).
It is similar in structure to Medusa,
At some extremely superficial level, I guess this is true. However, Twisted does a lot more than just clone medusa. Even at a basic level, you could say that it complicates the medusa structure a great deal with a unified notion of authentication, automatic persistence, and incidentally, several full-featured applications. :-)
but simplifys it by dispensing with producers.
Twisted has producers, but only when you need them. http://twistedmatrix.com/users/glyph/TwistedDocs/Twisted-0.11.0/twisted/inte... abstract_FileDescriptor.py.html#registerProducer
Well, it depends what you mean by VFS. Twisted has a perception of the filesystem as more like a special-case of "container" than containers as a special-case of the filesystem. There are containers which can respond to specifics of the HTTP protocol that are not derived from files; would you call that part of a "VFS"? The semantics of "__builtins__.open" are not sufficiently rich to support that.
Why not? It sounds clever to me. I don't have a good picture of your requirements (other than "HTTP and FTP" at this point, so I can't fathom why you like or don't like this particular solution. (Why is it relevant?)
Yes, but that's a blocking API; ergo, it does not work in an asynchronous framework like Twisted. Not all "files" are associated with a file descriptor, so they may potentially support different operations. Directory listing on FTP, HTTP, and WebDAV sites is not necessarily consistent with the files that are actually available. There is metadata associated with some requests, and not with others... in short, there are lots of subtle issues involved with supporting each of these types of hierarchies well, and a blanket virtual "filesystem" implementation does not satisfy all (or even a reasonably large subset) of them. For different protocols, there may be API differences, unless some of the protocols are stripped to the "lowest common denominator", e.g. POSIX.
Not necessarily. You could always hack up the __builtins__ module at runtime to point to your newer, better open(). Either way it seems like there are probably issues with security & the presence of *real* file descriptors that you have to think about...
Ready for what, is the question? :-) Short form: HTTP yes, FTP no (but it could be with a little work), everything else yes. Long form: We do not currently have (or have any high-priority plans to produce) a "VFS", but I would contend that such a system is not necessary when you look at the way that Twisted does web resources. FTP doesn't currently use that model, but I don't think it would be a difficult modification; it uses the same Producer model that HTTP does. I estimate that it would be easier to modify Twisted in this way than to undertake a project to do your own VFS, but since I'm not exactly sure what's going to make you happy, I don't have a high degree of confidence in that estimation. As far as robustness (which is an implied issue with Twisted's "maturity"), Twisted is being used to run the main twistedmatrix.com site. We always run the most recent development version. So far, it's only served 17.422 downloads of Twisted itself and approximately three quarters of a million hits, and survived five or six freshmeat "attacks" :-). It did crash, once, about 8 months ago after a major refactoring (before we had acceptance tests...), but that bug was quickly fixed. We don't run a public FTP server, but the core parts of Twisted that enable one to run a website where you can download files are very personally important to me, so draw your own conclusions :-). -- ______ you are in a maze of twisted little applications, all | |_\ remarkably consistent. | | -- glyph lefkowitz, glyph @ twisted matrix . com |_____| http://www.twistedmatrix.com/

On Wed, Oct 10, 2001 at 03:56:38AM -0500, Glyph Lefkowitz wrote:
sticking to this convention... :-)
This I guess is part of what is putting me off Twisted. I'm familiar with Medusa, and Twisted is just different enough that I've got a learning curve ahead of me. Unless I get convinced Twisted is worth the extra effort...
Python async socket framework... it's filling the same void :-)
That probably accounts for Twisted having more code. However, in my case I'm just after http and ftp server capabilities. I have a feeling Medusa's ftp code at least is more complete.
I'm also a bit of a less-is-more person... I don't really need all of that. However, if the framework is neat, and I get that without extra hassles and bloat, I guess I'd use it.
Hmmm... looking into this.
I was just looking at the ftp part, and it seems that it can only serve files. If it can to more than this, I guess I'm more interested.
It is overkill for what it is. Spawning whole extra processes and using inter-process communication over a channel when just classes and/or threads would do the job. The biggest limitation was the API... I basicly want to serve up a virtual mirror, which means I need to be able to identify and mirror things like symlinks. However, because it is quite simple, I was thinking of making a VFS backend for my VFS that talks to PyVFS... more as a proof of concept, but also to get tar.gz, cpio, mailbox, etc for free.
The Medusa VFS is a stripped down POSIX, and it actualy causes blocking problems for my application. That's why I added ready() support to asynchat, so that file producers can tell the select loop they would block and be excluded from that time round the loop. This is basicly a hack... a probably neater way is to make the file producers asychat's themselves but that would require major restructuring of Medusa's ftp and http servers.
That's why I'm going for POSIX :-)
Thats why I'd keep it seperate inside a vfs module... remember at least one vfs backend would be using the builtin open and os modules to access the real fs. Though now you mention it... it could be away of transperantly running any application on top of a VFS without changing it at all. Hmmm... bound to be clashes... I wonder...
FTP is the main one...though I'd really love rsync... roll on librsync :-)
The application is basicly a mirror daemon that serves up a virtual mirror of an ftp, rsync or http site. To the ftp and http clients, it appears to be a full mirror. Files are fetched on demand and stored in a partial mirror on the server. My plan was to use a VFS to 'mount' the remote ftp, rsync, or http site. The partial mirror on the server would also be accessed through the same VFS interface. Then I was going to overlay a "mirrorfs" VFS over them both, that would mirror the remote VFS to the local one on demand. The http/ftp server part would then just serve files from the mirrorfs VFS. The beauty of this is the different VFS backends would allow you to do wierd things like on-demand mirror a remote ftp server into a local tar.gz file, not that you'd want to :-) -- ---------------------------------------------------------------------- ABO: finger abo@minkirri.apana.org.au for more info, including pgp key ----------------------------------------------------------------------
participants (5)
-
Donovan Baarda
-
Glyph Lefkowitz
-
Itamar Shtull-Trauring
-
kapil thangavelu
-
Moshe Zadka