[Twisted-Python] Question regarding async stuff
I'm working on a network file server right now and am using Twisted as my networking framework. Overall, it's working really well. The only thing I haven't been able to figure out so far is what is async and what is not. It looks like data transfer is async (self.transport.write(msg)), but the functions called in the protocol can block the entire twisted main loop. Delayed has a note that it is being depreceted, but looked to be the only way to make the Protocol dataReceived() function run in an async manner. What is replacing Delayed and is there any other way that I can cause the Protocol dataReceived call to handle async? The reason I ask is that it doesn't appear that a twisted server can handle processing multiple requests at the same time. It appears that a request must come in and be processed before another request can be serviced. It probably isn't a big deal for me as the number of requests and amount of processing per request will be low, but it could be that I'm totally misunderstanding how the framework works. Any insight is appreciated :) Greg Fortune
On Wed, 10 Jul 2002 14:30:38 -0400, Greg Fortune
I'm working on a network file server right now and am using Twisted as my networking framework. Overall, it's working really well. The only thing I haven't been able to figure out so far is what is async and what is not. It looks like data transfer is async (self.transport.write(msg)), but the functions called in the protocol can block the entire twisted main loop.
Anything that returns a Deferred is asynchronous; everything else is synchronous. I think that your use of these terms belies a misunderstanding of what's going on.
[...] What is replacing Delayed [...]
IReactorTime.callLater(...) http://twistedmatrix.com/documents/TwistedDocs/Twisted-0.19.0rc3/twisted/int...
The reason I ask is that it doesn't appear that a twisted server can handle processing multiple requests at the same time. It appears that a request must come in and be processed before another request can be serviced. It probably isn't a big deal for me as the number of requests and amount of processing per request will be low, but it could be that I'm totally misunderstanding how the framework works.
Yeah, I think you're misunderstanding something ;). Protocol.dataReceived is called only when data is available from a network connection; therefore, partial requests coming in are partially parsed and buffered by state machines (Protocol instances). When a full request has been received, the request can be processed. If processing that request requires accessing other asynchronous data that's not yet available, that's fine too -- just do your transport.write(...) to respond later on, when a different event arrives. Some parts of the framework (twisted.spread, twisted.enterprise) make this extremely explicit, by allowing the user to return a Deferred when their response is not yet ready. Twisted can be "processing multiple requests at the same time" in the sense that while it's waiting on data from the network, it won't be blocked, since all I/O is asynchronous. It will be "stopped" while doing literal CPU-bound "processing" of a request; but while this may seem bad if you look at it naively, 90% of all request-processing you'll do is incredibly brief, and managing the resources needed to parallelize that processing is an order of magnitude (or more, thanks to python's global interpreter lock, mutex contention, context switching, and other thread nastinesses) more intensive than just running the requests one after another. This is before we even start talking about the inherent, dangerous complexity of thread-based approaches to state management; they're inefficient, and they're often buggy too. Even given all that, Twisted does have good support for threads when you really need them. http://twistedmatrix.com/documents/TwistedDocs/Twisted-0.19.0rc3/twisted/int... I hope this answers your questions. What sort of file server are you writing? -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com |
Good deal, I did a poor job of communicating my question, but I did understand everything. Some of the processing I was considering doing was fairly CPU intensive, but some simple things reduced the processing overhead to almost nothing. So, I can even assume that during processing of a request (I'm not talking about data transport here, just the processing) that operations on data members in the protocols factory can be considered atomic? If the server can't "process" more than one request at a time, two protocols can not be accessing the factory members concurrently, correct? I've got a mutex wrapper around some stuff in my factory right now, but it sounds like I can rip that stuff out. The server I'm writing is pretty simple. In principle, it's an ftp server with special restrictions. It's a file server with the requirement that it provide a pool of unbound files and then a unique path/name to any file that has been bound. I'm going to use it to store and retrieve graphics associated with entities in a database for a point of sale/inventory system I'm developing. That way I can be sure that my pathnames will be at most a certain length. All directories will be 1 char long and filenames will be 6 chars long. At a depth of 4 with 10 directories spanning from each node, I can store somewhere over 10E9 files. Thanks for the quick response, Greg <snip>
Yeah, I think you're misunderstanding something ;).
Protocol.dataReceived is called only when data is available from a network connection; therefore, partial requests coming in are partially parsed and buffered by state machines (Protocol instances).
When a full request has been received, the request can be processed. If processing that request requires accessing other asynchronous data that's not yet available, that's fine too -- just do your transport.write(...) to respond later on, when a different event arrives. Some parts of the framework (twisted.spread, twisted.enterprise) make this extremely explicit, by allowing the user to return a Deferred when their response is not yet ready.
Twisted can be "processing multiple requests at the same time" in the sense that while it's waiting on data from the network, it won't be blocked, since all I/O is asynchronous. It will be "stopped" while doing literal CPU-bound "processing" of a request; but while this may seem bad if you look at it naively, 90% of all request-processing you'll do is incredibly brief, and managing the resources needed to parallelize that processing is an order of magnitude (or more, thanks to python's global interpreter lock, mutex contention, context switching, and other thread nastinesses) more intensive than just running the requests one after another.
This is before we even start talking about the inherent, dangerous complexity of thread-based approaches to state management; they're inefficient, and they're often buggy too.
Even given all that, Twisted does have good support for threads when you really need them.
http://twistedmatrix.com/documents/TwistedDocs/Twisted-0.19.0rc3/twisted/in ternet/interfaces_IReactorThreads.py.html
I hope this answers your questions. What sort of file server are you writing?
Greg Fortune wrote:
The reason I ask is that it doesn't appear that a twisted server can handle processing multiple requests at the same time. It appears that a request must come in and be processed before another request can be serviced. It probably isn't a big deal for me as the number of requests and amount of processing per request will be low, but it could be that I'm totally misunderstanding how the framework works.
Well, consider this - unless you have a SMP machine, your computer can only do one thing at a time anyway. So, serving more than one request is just a matter of how you give time to handling different tasks - there's no one way you can really do more than one thing at once. The idea then is to do a little bit each time we get a dataReceived callback, and finish up as quickly as possible so we can move on to handling the next event. We can also use stuff like producers and reactor.callLater to create events even when we aren't getting data from the network. In each event handler though we need to make sure we don't block. And yes, this can work. For example this is how Squid works, and Squid is rather fast and can do more than one HTTP request "at once". Recommended reading: http://www.cs.wustl.edu/~schmidt/PDF/reactor-siemens.pdf
<snip>
Recommended reading: http://www.cs.wustl.edu/~schmidt/PDF/reactor-siemens.pdf
Thanks, I'll take a look Greg
participants (3)
-
Glyph Lefkowitz
-
Greg Fortune
-
Itamar Shtull-Trauring