[Twisted-Python] API docs
I was talking to MetaCosm, and he said the template for API docs he uses in his professional projects is like so: """ Usage: foo() -> baz Examples: if blah(): foo() Big Picture: (!!) This class is meant to be used in a Quuxer, and you should usually override the getBaz method to return a Spam instance, although it's not required. NOTES: This class is currently in a state of flux; it will soon be refactored, so watch out for API changes """ etc. The main thing here is "Big Picture", which should give the method/class some context. NOTES is mainly for temporary stuff; It's probably not crucial to be in the docstrings (probably it should just be in near-by #XXX comments). So yeah, I urge people who are writing docstrings to put stuff into context; I'll try to do the same thing. Whether or not you use a similar format isn't really important, but it seems sane enough to me. We'll probably be doing a lot of this during the 0.99.0 cycle, but it's never too early to start improving documentation :) Anyway, enough rambling: off to bed with me. -- Chris Armstrong << radix@twistedmatrix.com >> http://twistedmatrix.com/users/carmstro.twistd/
On Friday 19 July 2002 13:17, Christopher Armstrong wrote:
I was talking to MetaCosm, and he said the template for API docs he uses in his professional projects is like so:
""" Usage: foo() -> baz
Examples: if blah(): foo()
Big Picture: (!!) This class is meant to be used in a Quuxer, and you should usually override the getBaz method to return a Spam instance, although it's not required.
NOTES: This class is currently in a state of flux; it will soon be refactored, so watch out for API changes
"""
etc.
The main thing here is "Big Picture", which should give the method/class some context. NOTES is mainly for temporary stuff; It's probably not crucial to be in the docstrings (probably it should just be in near-by #XXX comments). So yeah, I urge people who are writing docstrings to put stuff into context; I'll try to do the same thing. Whether or not you use a similar format isn't really important, but it seems sane enough to me. We'll probably be doing a lot of this during the 0.99.0 cycle, but it's never too early to start improving documentation :)
Yes, I fully agree here! Without the big picture it is difficult to determine the contexts of all modules and classes - especially for newbie "twistas". Also, we need some contextual examples in the api docs itself. A full blown example should be (and generally is) found in doc/examples. Speaking of examples, could the author of the examples please add comments and some explanations of the examples? I'm sure it's all quite obvious to the author but for newbies, it's a bit too terse. Especially when your head is till spinning with banana spreads and the like. Jelly? What _is_ twistd? Should there be only one twistd running with multiple "protocol handlers" or one twistd running per protocol? With some of the examples, I find myself saying: "What? That's it? It can't be that simple! Am I missing something?" Some explanations in the code (comments) would go a long way to solving this. Doc problems: --------------- Some of the newer documentation is quite good in the sense that it hints at a lot but its a bit confusing. The stuff about web-widgets and dom templates...It's starts out really great but ends before it should! :-) The widgets.html doc for example. It says that the code in the Example.tar.gz explains why you get "No Resource" error when you first start it up. Erm...I looked at the code and no where does it say why. And what to do afterwards. I still don't understand how web-widgets work! Where do you then point the browser? Also, what's a reactor? Is there some other doc to explain the concept of reactors? Would you use defereds in a web application? When and why? As you can see, there are problems with the docs. I understand the main twistas are too busy (insane is more appropriate!) with twisted itself, to write more docs. Honestly, I would _love_ to write some docs, tutorials and howtos, if only I could understand the thing myself! -- Regards, Mukhsein Johari
Mukhsein Johari wrote:
Also, what's a reactor? Is there some other doc to explain the concept of reactors?
There's a start of a doc for this in CVS as doc/howto/reactor-basics.html I have a version of the docs from CVS online that I use for proofreading as I edit and so on, so you can view that doc from there: http://day.cubik.org/~bruce/tmc/documents/howto/reactor-basics It needs to improve and cover a lot more, but it is a start. - Bruce
On Fri, 19 Jul 2002 21:04:05 +0800, Mukhsein Johari <arashi1@pd.jaring.my> wrote:
On Friday 19 July 2002 13:17, Christopher Armstrong wrote:
I was talking to MetaCosm, and he said the template for API docs he uses in his professional projects is like so:
The main thing here is "Big Picture", which should give the method/class some context. NOTES is mainly for temporary stuff; It's probably not crucial to be in the docstrings (probably it should just be in near-by #XXX comments). So yeah, I urge people who are writing docstrings to put stuff into context; I'll try to do the same thing. Whether or not you use a similar format isn't really important, but it seems sane enough to me. We'll probably be doing a lot of this during the 0.99.0 cycle, but it's never too early to start improving documentation :)
Does epydoc support @rationale or something? That would be handy. :) "Big picture" stuff should probably also have its own directory in the doc/ tree too -- most other projects have a doc/design/ (or similar) which describes things like this. My tragic allergy to formal process has prevented me from doing enough of this so far; however, now that the cognitive dust has started to settle, I think that I could venture in and write some design documentation. I don't think this is really important in the 1.0 release process or the other things I'm working on at the moment, so it will have to wait a little while. Is it important to do soon? Remember also that queries that show me which internal design ideas are blatantly obvious and which require some more elucidation are helpful in prioritizing them.
Yes, I fully agree here! Without the big picture it is difficult to determine the contexts of all modules and classes - especially for newbie "twistas". Also, we need some contextual examples in the api docs itself. A full blown example should be (and generally is) found in doc/examples.
doc/howto/listings is a better-documented source of that sort of thing.
Speaking of examples, could the author of the examples please add comments and some explanations of the examples?
In general a good idea, but just in case anybody who wants to check in docs is listening, I'll respond to your particular questions...
Jelly?
http://twistedmatrix.com/products/spread Anything more specific you need to know?
What _is_ twistd?
Wow. I was pretty surprised to find that there is no quick answer to this question so I'll write one here :-). 'twistd' is the Twisted Daemon. It is a simple tool designed in a UNIX and command-line friendly way; however, it is portable to many environments (including Win32, and even Jython!). twistd can be used to load multiple formats (pickle, marmalade XML, "AOT" python source) of serialized twisted application objects and run them. Most persistent processes in Twisted are run using the 'twistd' script. The notable exception to this rule is currently graphical client programs. The "big picture" here is that in the future, twistd may have graphical or other platform-specific cousins which are designed to work with a particular toolkit or host operating system. For example, a gtk-specific mainpoint, or a win32 service GUI which generalizes some tasks. My goal is that eventually _all_ Twisted-ly correct programs will be able to plug in to multiple main-points that are appropriate to different situations. (My short term incentive for this is I want a GladeReactor which displays in an open window all the open connections and some brief statistics on them in real-time; this would be useful for debugging certain kinds of application.)
Should there be only one twistd running with multiple "protocol handlers" or one twistd running per protocol?
This *really* depends on your application and your site. In general, if you're using twistd, you want either 1. only one port (in this case, normally PB or HTTP) open, accessing different services 2. multiple ports (SMTP, POP3, DNS...) open, integrating with the same service 3. some combination of 1. and 2. I imagine that the most common case is 3., with 1. running a close second. Even in the case of 1., you generally want to communicate with *other* web or PB servers, over either web.distrib or twisted.sister. In some cases, I just use twisted for the one protocol/one process/one service model; but that's just because I know it well, it's installed on all my hardware, and it takes about an eighth of a second to configure a new webserver that does what I need for a particular box ;-).
With some of the examples, I find myself saying: "What? That's it? It can't be that simple! Am I missing something?" Some explanations in the code (comments) would go a long way to solving this.
No really, it is that simple.
Some of the newer documentation is quite good in the sense that it hints at a lot but its a bit confusing. The stuff about web-widgets and dom templates...It's starts out really great but ends before it should! :-)
We're working on it, but thanks for the feedback! There has been a big push for more documentation recently, as Twisted is starting to see more general applicability and some of the important, core APIs are nearing finalization. We're not quite there yet.
The widgets.html doc for example. [...]
Widgets is getting slowly deprecated in favor of the new domtemplate stuff. (In many ways, they do the same thing.) It still remains to be seen whether widgets still has some usefulness beyond what the domtemplate/domwidgets approach has brought; I am going to be working with the web stuff very soon and I hope to verify this, and make a device-independent version of web.widgets.Form (and renderers for the various DOM stuff) before completely deprecating web.widgets.
Also, what's a reactor? Is there some other doc to explain the concept of reactors?
Among others, http://www.cs.vu.nl/~eliens/online/oo/I/2/reactor.html I agree that Twisted needs its own doc, but the Twisted "Reactor" interfaces are based around the (relatively) well-known design pattern for event handling. You shouldn't have to worry too much about the Reactor if you're writing an application using Twisted's high-level facilities.
Would you use deferreds in a web application? When and why?
Yes. When: If you have a web application that needs to do anything that will take a little while and/or require some network communication, Deferreds are the preferred way to do this in Twisted. Why: twisted.web.widgets and twisted.web.domtemplate (and therefore, I believe, domwidgets, though I'm not absolutely sure...) will handle a Deferred as a return value and Do The Right Thing. So we make it nice and convenient for you; the added benefit of this is that your web application doesn't have to worry about managing lock contention or thread pools. Things like twisted.enterprise.adbapi, meant to do potentially long-running, blocking operations, will typically return a deferred, making it even easier to take advantage of the convenience in the twisted.web deferred handling.
As you can see, there are problems with the docs. I understand the main twistas are too busy (insane is more appropriate!) with twisted itself, to write more docs.
Insane is right. I can't even keep up with the commits list myself any more.
Honestly, I would _love_ to write some docs, tutorials and howtos, if only I could understand the thing myself!
Keep these great questions coming, they help. At least we could point other newbies at the mailing list archive after they've been answered. It would be an even bigger help if you could organize some of the answers and submit them as docs. Hope this helps, -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
Twisted gurus, I just noticed item 008 on the twisted TO DO list: File Transfer layer for PB. This would be especially nice for twisted.words; having standard a way to transfer "large" (100MB+) packets across or in tandem with a PB connection without breaking anything would be very good. <sophomoric question> Would an ftp connection (authenticated using cred, of course) in tandem or parallel to the PB connection work? ... but maybe you are referring to implementing file transfer as *part* of the PB protocol, in which case this question might not make any sense at all. </sophomoric question> And how close is this to being implemented? My interest is not merely academic -- the application I am working on will be "routinely" transferring 100MB+ files, and I'd like to use the PB as one of our interfaces. (BTW, one of the components of our app is written in Lisp, so I am thinking about having our Lisp programmer look at Twisted Emacs for hints as to how a Common Lisp implementation of PB could be done ... but I notice that several twisted minions have Lisp in their resumes ... is anyone working on / contemplating a CL implementation of PB? :^) Cheers, -- Steve. Stephen C. Waterbury http://misspiggy.gsfc.nasa.gov/people/waterbug.html
On Thu, 25 Jul 2002 14:45:09 -0400, Steve Waterbury <steve.waterbury@gsfc.nasa.gov> wrote:
I just noticed item 008 on the twisted TO DO list: File Transfer layer for PB. <sophomoric question> Would an ftp connection (authenticated using cred, of course) in tandem or parallel to the PB connection work? ... but maybe you are referring to implementing file transfer as *part* of the PB protocol, in which case this question might not make any sense at all. </sophomoric question>
Well, both. It's more like a convention, really; all the pieces already exist (serialized remote method calls, waiting until the outgoing buffer is empty) in order to create a file transfer convention; of course, then you've got all the usual fun problems; trying to connect with a listening socket on port 0, backing off and using the existing connection with slightly poorer performance if one or both of the users are behind a firewall...
And how close is this to being implemented?
There's a lot of stuff on my plate that comes before it, most of all the 1.0 release. On the other hand, it's a relatively simple thing to add in.
My interest is not merely academic -- the application I am working on will be "routinely" transferring 100MB+ files, and I'd like to use the PB as one of our interfaces.
There will be a little bit of overhead in the initial implementation, considering that it will be using PB calls. In the short term, the large file transfers would probably be better handled over HTTP (Twisted's twisted.web.static.File can easily be used to this end). FTP is a much nastier protocol.
(BTW, one of the components of our app is written in Lisp, so I am thinking about having our Lisp programmer look at Twisted Emacs for hints as to how a Common Lisp implementation of PB could be done ... but I notice that several twisted minions have Lisp in their resumes ... is anyone working on / contemplating a CL implementation of PB? :^)
We have contemplated it a lot, but have yet to implement it. Really the problem is that "common lisp" isn't a language so much as a bizarre constellation of languages, none of which have anything to do with each other when it comes to things like "sockets" and "files". It hasn't been worth anybody's effort to go to all that work just for, say, CLisp or SBCL. AFAIK it's not even *possible* to do it properly in CLisp yet, due to the absence of non-blocking sockets, but this could be FUD; haven't read the docs in a while. However, many of us are fans of Lisp-like languages and I'm sure that more than one person would be thrilled to see a multi-vendor-supporting CL implementation of PB. -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
Glyph Lefkowitz wrote:
There will be a little bit of overhead in the initial implementation, considering that it will be using PB calls. In the short term, the large file transfers would probably be better handled over HTTP (Twisted's twisted.web.static.File can easily be used to this end). FTP is a much nastier protocol.
Ah, my ignorance again -- I hadn't noticed twisted.web.static.File; I'll look at that ... we'll be doing at least 3 interfaces that use HTTP anyway (web, xmlrpc, and SOAP) ... thanks!
is anyone working on / contemplating a CL implementation of PB? :^)
We have contemplated it a lot, but have yet to implement it. Really the problem is that "common lisp" isn't a language so much as a bizarre constellation of languages, none of which have anything to do with each other when it comes to things like "sockets" and "files". It hasn't been worth anybody's effort to go to all that work just for, say, CLisp or SBCL. AFAIK it's not even *possible* to do it properly in CLisp yet, due to the absence of non-blocking sockets, but this could be FUD; haven't read the docs in a while.
However, many of us are fans of Lisp-like languages and I'm sure that more than one person would be thrilled to see a multi-vendor-supporting CL implementation of PB.
Yeah, the non-blocking sockets might be a problem ... I'll ask about that. If we do an implementation, we'll at least make an effort to make it multi-vendor-supporting. Our Lisp specialists have used both Allegro and Harlequin at various times. Let us know if other vendor or open-source Lisps are of interest, etc. Cheers, -- Steve. Stephen C. Waterbury http://misspiggy.gsfc.nasa.gov/people/waterbug.html
On Thu, 25 Jul 2002 16:43:33 -0400, Steve Waterbury <steve.waterbury@gsfc.nasa.gov> wrote:
Glyph Lefkowitz wrote:
There will be a little bit of overhead in the initial implementation, considering that it will be using PB calls.
As several people have pointed out to me, sometimes this additional overhead is worth paying in order to get rid of the TCP connection startup cost. Of course, this will vary with your application, but it's probably worthwhile to have the ability to reuse the connection. In addition, sometimes you want to communicate large *objects*, and not just large *files*; in those cases, it's really handy to have objects which can be "paged" between systems, interleaved with other messages. This is also a decent basis for "file transfer" between two objects, since "very large string, stored on disk" is a degenerate case of "very large object, stored somewhere". I've checked in an implementation of this to twisted.spread.util.Pager. Not much in the way of docs yet, but the test cases (twisted.test.test_pb, look for "pager") should at least explain some of it.
However, many of us are fans of Lisp-like languages and I'm sure that more than one person would be thrilled to see a multi-vendor-supporting CL implementation of PB.
Yeah, the non-blocking sockets might be a problem ... I'll ask about that. If we do an implementation, we'll at least make an effort to make it multi-vendor-supporting. Our Lisp specialists have used both Allegro and Harlequin at various times. Let us know if other vendor or open-source Lisps are of interest, etc.
I can speak only to my own interests. Other members of the Twisted community are working in radically different environments and on radically different problems. As long as you asked, though ... While I appreciate that they're technically superior, the proprietary lisps don't interest me; the bang for your buck of using a proprietary language solution is seldom worth the hefty license fees (especially in a situations where "interoperability" and "ubiquity" are the most interesting parts, like most of the applications I've come up with for Twisted). Conversely, support for *any* open-source CL would be really cool. Even GCL would be fun to play around with. I'd like to develop Lisp components for some of the applications I'm working on, mostly to tinker with it for fun, but I don't have enough time or inclincation to dive into the implementation myself. -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
Glyph Lefkowitz wrote:
As several people have pointed out to me, sometimes this additional overhead is worth paying in order to get rid of the TCP connection startup cost. Of course, this will vary with your application, but it's probably worthwhile to have the ability to reuse the connection.
In addition, sometimes you want to communicate large *objects*, and not just large *files*; in those cases, it's really handy to have objects which can be "paged" between systems, interleaved with other messages.
Yes, both of these are important considerations for our app.
This is also a decent basis for "file transfer" between two objects, since "very large string, stored on disk" is a degenerate case of "very large object, stored somewhere". I've checked in an implementation of this to twisted.spread.util.Pager. Not much in the way of docs yet, but the test cases (twisted.test.test_pb, look for "pager") should at least explain some of it.
Great -- I'll take a look at that. Thanks!
[SW wrote:]... Let us know if other vendor or open-source Lisps are of interest, etc.
I can speak only to my own interests. Other members of the Twisted community are working in radically different environments and on radically different problems. As long as you asked, though ...
While I appreciate that they're technically superior, the proprietary lisps don't interest me; the bang for your buck of using a proprietary language solution is seldom worth the hefty license fees (especially in a situations where "interoperability" and "ubiquity" are the most interesting parts, like most of the applications I've come up with for Twisted).
Understood. Since we have limited manpower for our Lisp work, we tend to value the power of commercial development tools, but we do care about portability, so we try to adhere to ANSI CL -- see: http://exp-engine.sourceforge.net/lisp.html (And it is *very* important that there are no *run-time* license fees! Anyone can download our stuff and run it.) Express Engine, our Lisp project, is somewhat esoteric, so even though it's on SF no other developers have joined. ;^) If anyone is curious, see: http://exp-engine.sourceforge.net Express Engine encapsulates some very important functions for our app, enabling us to munge data produced by various CAD/CAE/CAM ("CAX") tools, for which the most widely implemented exchange standard is ISO 10303 (STEP). Express Engine encapsulates our STEP data I/O and mapping functionality. (Like I said, somewhat esoteric! :^) As you can imagine, STEP files produced by CAX tools can be quite huge. This also hints at why Twisted is such a natural fit for our app: a multi-player networked game environment is not all that different from a massive collaborative engineering environment ... at the infrastructure level, essentially identical. Cheers, -- Steve. Stephen C. Waterbury http://misspiggy.gsfc.nasa.gov/people/waterbug.html
On Fri, 26 Jul 2002 17:49:04 -0400, Steve Waterbury <steve.waterbury@gsfc.nasa.gov> wrote:
This also hints at why Twisted is such a natural fit for our app: a multi-player networked game environment is not all that different from a massive collaborative engineering environment ... at the infrastructure level, essentially identical.
I've heard this comment several times and I don't think I've said much about it before. Allow me to wax philosophical for a moment. One of my favorite things about Twisted is that, due to its origins, it challenges people to think about what they're doing and whether "work" is really any harder or more serious than "play". It turns out that building massively multiplayer games - "serious fun" - may be a great deal *more* difficult than the average "software engineering" project. If someone is reading this and hasn't been thusly challenged by Twisted or by something else, maybe they should be ;-). "The Hacker Ethic" is a great book on this. The Twisted framework shows its Python heritage this way, too: Python is a language designed to teach, and markedly *unlike* Pascal, it is not a stuffy, academic pedant. Instead, Python applies its own lessons by making the practical work of programming easier. It illuminates the fact that programming *is* learning: as Perlis said, "If we knew what we were doing, we would not call it programming." -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
On Thu, Jul 25, 2002 at 02:45:09PM -0400, Steve Waterbury wrote:
File Transfer layer for PB. This would be especially nice for twisted.words; having standard a way to transfer "large" (100MB+) packets across or in tandem with a PB connection without breaking anything would be very good.
i also require filetransfers over pb for my project, i think twisted currently is lacking a protocol for reading files in a reliable manner (nfs, fifos, ...). i doubt i'm skilled enough, but i'm trying to implement protocols.file right now, hoping this will help me with getting files accross pb later. suggestions? paul
On Fri, Jul 26, 2002 at 02:31:29PM +0200, Paul Boehm wrote:
On Thu, Jul 25, 2002 at 02:45:09PM -0400, Steve Waterbury wrote:
File Transfer layer for PB. This would be especially nice for twisted.words; having standard a way to transfer "large" (100MB+) packets across or in tandem with a PB connection without breaking anything would be very good.
i also require filetransfers over pb for my project, i think twisted currently is lacking a protocol for reading files in a reliable manner (nfs, fifos, ...). i doubt i'm skilled enough, but i'm trying to implement protocols.file right now, hoping this will help me with getting files accross pb later. suggestions?
AFAIK, twisted.web.static _does_ do non-blocking reading, so check that out. I don't think anyone has implemented non-blocking writing yet. (There's some sort of FileWrapper protocol or somesuch IIRC, but it just assumes that write() won't take very long -- it's only for testing)
On Fri, Jul 26, 2002 at 07:41:27AM -0500, Chris Armstrong wrote:
AFAIK, twisted.web.static _does_ do non-blocking reading, so check that out. I don't think anyone has implemented non-blocking writing yet. (There's some sort of FileWrapper protocol or somesuch IIRC, but it just assumes that write() won't take very long -- it's only for testing)
no it doesn't, f = open(self.path,'rb') ... # return data FileTransfer(f, size, request) # and make sure the connection doesn't get closed return server.NOT_DONE_YET this is a normal file instance.. i have yet to find a way to do nonblocking file reads.. till then i'm starting to implement a protocol which i can tell upon initialization which parts of the file i want to have streamed. ah, and i think twisted.web.static.FileTransfer has a bug: you can specify where it starts reading with seek, and set an arbitrary size.. but it read()->write()'s by abstract.FileDescriptor.bufferSize and then checks "if self.file.tell() == self.size:" which won't be correct if the file is larger than the requested chunk. paul
On Fri, Jul 26, 2002 at 07:41:27AM -0500, Chris Armstrong wrote:
AFAIK, twisted.web.static _does_ do non-blocking reading, so check that out. I don't think anyone has implemented non-blocking writing yet. (There's some sort of FileWrapper protocol or somesuch IIRC, but it just assumes that write() won't take very long -- it's only for testing)
Bzzt, wrong. All file IO is sync. To do async file IO, you need to use a different kernel API -- called AIO, and not very widely available. (Or emulate it by forking worker processes that pass data between stdin/stdout and file). select(2)/poll(2) overhead has been historically deemed too high for async file IO, files tend to be so much faster that separating them from sockets was historically thought of as a good idea. Later generations have learned to curse select(2), see how slow network filesystems and cheap IDE disks can be, and hope for an low-overhead select(2)/poll(2) replacement for both file IO and non-file IO. -- tv@{{hq.yok.utu,havoc,gaeshido}.fi,{debian,wanderer}.org,stonesoft.com} double a,b=4,c;main(){for(;++a<2e6;c-=(b=-b)/a++);printf("%f\n",c);}
On Mon, Jul 29, 2002 at 12:35:31PM +0300, Tommi Virtanen wrote:
Later generations have learned to curse select(2), see how slow network filesystems and cheap IDE disks can be, and hope for an low-overhead select(2)/poll(2) replacement for both file IO and non-file IO.
Has anyone looked at the kqueue(2) and kevent(2) system calls in FreeBSD? These seem to have less overhead than select(2)/poll(2), and FreeBSD also has the AIO system calls, so these two features can be used together to provide async file I/O and better async socket I/O (at least on one platform). I've also read that there's a Linux kernel patch that adds a device called /dev/poll, which has less overhead than poll(2). You can read more about these things at: http://www.kegel.com/c10k.html There's a Python binding for kqueue(2)/kevent(2) at: http://people.freebsd.org/~dwhite/PyKQueue/ -- Matt Campbell Email and MSN Messenger: mattcampbell@pobox.com Phone: (316) 652-8727 Web site: http://www.pobox.com/~mattcampbell/
Matt Campbell wrote:
Has anyone looked at the kqueue(2) and kevent(2) system calls in FreeBSD? These seem to have less overhead than select(2)/poll(2), and FreeBSD also has the AIO system calls, so these two features can be used together to provide async file I/O and better async socket I/O (at least on one platform).
IronPort (http://www.ironport.com) use these, together with Stackless. I'm hoping we can: (a) Convince them at some point to use Twisted for the event loop (I talked to Christian Tismer about this at EuroPython). (b) Get a kqueue reactor written. Doing (a) might lead almost immediately to (b), but I doubt it will happen anytime soon unless someone here knows someone at Ironport or Christian Tismer and can push this some more. If anyone wants to do (b), please do - writing a reactor isn't hard, but if you don't have a FreeBSD box it's kinda hard to develop a kqueue one.
On Mon, Jul 29, 2002 at 02:30:09PM -0400, Itamar Shtull-Trauring wrote:
(b) Get a kqueue reactor written.
I have a FreeBSD box, so I'll do this. -- Matt Campbell Email and MSN Messenger: mattcampbell@pobox.com Web site: http://www.pobox.com/~mattcampbell/
Matt Campbell wrote:
(b) Get a kqueue reactor written.
I have a FreeBSD box, so I'll do this.
Great! I suggest using twisted.internet.poll as a model for how to do it (though it'll probably be cleaner). Please feel free to ask questions, on the mailing list, IRC (#twisted on irc.openporjects.net) or by emailing me directly.
On Mon, 29 Jul 2002 12:35:31 +0300, Tommi Virtanen <tv@twistedmatrix.com> wrote:
On Fri, Jul 26, 2002 at 07:41:27AM -0500, Chris Armstrong wrote:
AFAIK, twisted.web.static _does_ do non-blocking reading, so check that out. I don't think anyone has implemented non-blocking writing yet. (There's some sort of FileWrapper protocol or somesuch IIRC, but it just assumes that write() won't take very long -- it's only for testing)
Bzzt, wrong. All file IO is sync. To do async file IO, you need to use a different kernel API -- called AIO, and not very widely available. (Or emulate it by forking worker processes that pass data between stdin/stdout and file).
Mr. Virtanen is right. The reason you might think it's async is that most File I/O operations in Twisted are performed in a response to some request (web, ftp), and are treated as "large file" operations. The file is read one 'chunk' at a time, and more is only read when the network connection it's being written to becomes available for writing again. This is what the "producer/consumer" API in twisted.internet is for. In general, on linux, this will mean that your data is available in most cases anyway, due to the speed difference between network & file I/O. In the cases where it's not, it's likely that your server is under high enough load that it's OK to block for an ms or two to wait for the data. Asyncore has a great little comment about this: # After a little research (reading man pages on various unixen, and # digging through the linux kernel), I've determined that select() # isn't meant for doing doing asynchronous file i/o. # Heartening, though - reading linux/mm/filemap.c shows that linux # supports asynchronous read-ahead. So _MOST_ of the time, the data # will be sitting in memory for us already when we go to read it. Hope this is enlightening. -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
G'day, On Mon, Jul 29, 2002 at 10:35:59AM -0500, Glyph Lefkowitz wrote:
On Mon, 29 Jul 2002 12:35:31 +0300, Tommi Virtanen <tv@twistedmatrix.com> wrote:
On Fri, Jul 26, 2002 at 07:41:27AM -0500, Chris Armstrong wrote:
[...] Does anyone else find it funny to have a discussion about the performance problems of select/poll/whatever on a list devoted to a server framework written in an (psedo-)interpreted scripting language :-)
Asyncore has a great little comment about this:
# After a little research (reading man pages on various unixen, and # digging through the linux kernel), I've determined that select() # isn't meant for doing doing asynchronous file i/o. # Heartening, though - reading linux/mm/filemap.c shows that linux # supports asynchronous read-ahead. So _MOST_ of the time, the data # will be sitting in memory for us already when we go to read it.
Hope this is enlightening.
I'm curious as to what this actually means. If I give an open file a fcntl() to set it to non-blocking and use it in a select(), what happens? Do I get undeterminite behaviour? Does the select always return the file immediately as ready, only to have the read (sometimes) block? -- ---------------------------------------------------------------------- ABO: finger abo@minkirri.apana.org.au for more info, including pgp key ----------------------------------------------------------------------
On Tue, 30 Jul 2002 11:57:30 +1000, abo@minkirri.apana.org.au (Donovan Baarda) wrote:
G'day,
On Mon, Jul 29, 2002 at 10:35:59AM -0500, Glyph Lefkowitz wrote:
On Mon, 29 Jul 2002 12:35:31 +0300, Tommi Virtanen <tv@twistedmatrix.com> wrote:
On Fri, Jul 26, 2002 at 07:41:27AM -0500, Chris Armstrong wrote:
[...]
Does anyone else find it funny to have a discussion about the performance problems of select/poll/whatever on a list devoted to a server framework written in an (psedo-)interpreted scripting language :-)
Not at all. Performance and scaling issues with the asynchronous networking side of things are distinct from CPU load problems with data processing. Especially when you're spending CPU time on being flexible and robust, it's important to use the most efficient available mechanisms for doing the low-level, well-understood tasks that you're basing your high-level logic on.
Hope this is enlightening.
I'm curious as to what this actually means. If I give an open file a fcntl() to set it to non-blocking and use it in a select(), what happens? Do I get undeterminite behaviour? Does the select always return the file immediately as ready, only to have the read (sometimes) block?
You can't set a file to non-blocking, basically. It won't do anything. select() will always say that files are readable, and those files will "block" when you read from them. Then again, this is *extremely* fast, at least on linux systems. (I don't know about BSD, but at least on Windows you can't even *pass* a file fd to select; it will just give you an error.) -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
On Mon, Jul 29, 2002 at 12:35:31PM +0300, Tommi Virtanen wrote:
Later generations have learned to curse select(2), see how slow network filesystems and cheap IDE disks can be, and hope for an low-overhead select(2)/poll(2) replacement for both file IO and non-file IO.
I guess this is the lesson: Don't serve web data from an NFS mount!
Steve Waterbury wrote:
Twisted gurus,
I just noticed item 008 on the twisted TO DO list:
File Transfer layer for PB. This would be especially nice for twisted.words; having standard a way to transfer "large" (100MB+) packets across or in tandem with a PB connection without breaking anything would be very good.
<sophomoric question> Would an ftp connection (authenticated using cred, of course) in tandem or parallel to the PB connection work? ... but maybe you are referring to implementing file transfer as *part* of the PB protocol, in which case this question might not make any sense at all. </sophomoric question>
And how close is this to being implemented?
My interest is not merely academic -- the application I am working on will be "routinely" transferring 100MB+ files, and I'd like to use the PB as one of our interfaces.
Steve, This sort of thing is why some of the features of the BEEP protocol are nice, specifically the presence of multiple channels and that messages on those channels needn't block others, because it chunks them and interleaves them. BEEP is documented in RFC 3080: http://www.ietf.org/rfc/rfc3080.txt and the design rationale is documented in 3171, On the Design of Application Protocols: ftp://ftp.rfc-editor.org/in-notes/rfc3117.txt and more information, including links to various implementations can be found at: http://www.beepcore.org/ I don't know that I'd directly use BEEP because the existing implementations are lacking (thread-heavy), and it doesn't support features that would be needed for PB-over-UDP support, but having the option to run PB over BEEP would let you do large file transfers over the same connection (without worrying about NAT or firewall issues) without blocking the usual PB messages. I'd go so far as to say that this problem isn't just with file transfers. It is a potential problem anytime you have messages of different priorities being sent over PB. Larger, lesser priority messages block higher priority messages because they're all over the same connection and there aren't logical channels in PB (as in BEEP). You can work around this yourself by manually chunking messages and managing sending them at the sending side in small pieces to give other messages a chance to make it through. Another way of handling this, and nicer than laying on top of BEEP, would be to start down the path towards some of the features that would be needed or useful in UDP support. With UDP support, it'd be useful to be able to flag messages with different bits of data: * Reliable * Unreliable * Time-sensitive data which is useless after that time. in that sort of scenario, one could add an additional set of behaviors where the message that contained large, low-priority data would be flagged to let PB know that it was something that could be spread out over time and that timing for it wasn't a concern. Maybe there are already capabilities like this in PB .. but given the lack of docs, I haven't found them yet. :) Cheers, - Bruce
participants (11)
-
abo@minkirri.apana.org.au
-
Bruce Mitchener
-
Chris Armstrong
-
Christopher Armstrong
-
Glyph Lefkowitz
-
Itamar Shtull-Trauring
-
Matt Campbell
-
Mukhsein Johari
-
Paul Boehm
-
Steve Waterbury
-
Tommi Virtanen