[Twisted-Python] twisted application server
Hi, I am in charge of writing an application server for a three-tier architecture system. It will receive requests&data trough an xml protocol agreed with the client devs. Basically it will work on a database and return the results to the client. The way I thought it is a twisted server accepts connections from the clients, a coordinator thread gets the request in a queue and pass them to free workers, which in turn upon completion will place the result in the thread coordinator response queue. The weird part is that another system places data in the database when a specific request comes in so I have to permanently pool the database for incoming data. This thing will be done with a pooling thread. So, my question is: is this kind of architecture good to implement? (asynchronous server and threaded workers) I might somehow miss some important details, but please feel free to ask questions. Thanks.
On 11:11 am, coder_gus@lavabit.com wrote:
So, my question is: is this kind of architecture good to implement? (asynchronous server and threaded workers)
You might want to consider using process workers instead of thread workers, using spawnProcess and a simple control protocol. This is easier to debug, since threads are painful to figure out, and it also scales better - you escape python's GIL and can take advantage of multiple cores, but even if it weren't for that, you can switch spawnProcess to some kind of remote connection API and run your processes remotely.
On Wed, Feb 27, 2008 at 3:46 AM, <glyph@divmod.com> wrote:
On 11:11 am, coder_gus@lavabit.com wrote:
So, my question is: is this kind of architecture good to implement? (asynchronous server and threaded workers)
You might want to consider using process workers instead of thread workers, using spawnProcess and a simple control protocol. This is easier to debug, since threads are painful to figure out, and it also scales better - you escape python's GIL and can take advantage of multiple cores, but even if it weren't for that, you can switch spawnProcess to some kind of remote connection API and run your processes remotely.
Would you say that this method of using a separate process instead of threads to do work also is possibly good for database operations? I'm always reading 'beware of threads' ;) with respect to Twisted, so using a very simple control protocol to manage a separate dedicate database process might be better than relying on the threading that 'adbapi' uses? Any comments on this would be appreciated. Thanks, -Alex
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 11:37 pm, clemesha@gmail.com wrote:
Would you say that this method of using a separate process instead of threads to do work also is possibly good for database operations?
Sure.
I'm always reading 'beware of threads' ;) with respect to Twisted, so using a very simple control protocol to manage a separate dedicate database process might be better than relying on the threading that 'adbapi' uses?
It would definitely work. The main reason adbapi itself isn't done this way is that setting up a process pool is unfortunately more work than spawning a thread - you have to decide on a proper set of environment variables, locate the python interpreter, locate the script that you're going to run, make sure that script is installed by setup.py, decide on a control protocol, wait for the subprocess to start up before sending it messages, shut down the process appropriately, catch process termination and restart (or not), etc etc etc. Twisted should really have a nice API that does all that stuff for you, though, and it's a shame that it doesn't. The other reason is that adbapi is old. If we were going to implement something like ADBAPI today, we'd probably write a process pool first, but adbapi was written as a quick hack to get some database integration a long time ago. The only caveat is that additional Python interpreters take up more RAM than additional threads. If the database processes are doing any heavy CPU lifting though, this cost could well be worth it.
glyph@divmod.com wrote:
The other reason is that adbapi is old. If we were going to implement something like ADBAPI today, we'd probably write a process pool first, but adbapi was written as a quick hack to get some database integration a long time ago.
The only caveat is that additional Python interpreters take up more RAM than additional threads. If the database processes are doing any heavy CPU lifting though, this cost could well be worth it.
I'm just getting started with an XMLRPC server that uses adbapi. This satisfies my curiosity as to why it was threaded. Now assuming that, for my application, the "heavy lifting" is done by the DB engine itself, is there any good reason to dig into implementing a process pool? (I'm thinking of the future here, as the server begins to grow more functionality. Right now, my main concern is not to block multiple simultaneous requests from clients.) -- Don Dwiggins Advanced Publishing Technology
Actually, on a modern system like Linux, if you fork, the process share memory as long it's not written too (Copy-on-Write). This means, that if you'd fork off your process pool from your application, forking should have minimal impact. Even writing it as a standalone process means memory usage of one Python interpreter, plus minimal usage in the forked processes. Andreas Am Donnerstag, den 06.03.2008, 09:35 -0800 schrieb Don Dwiggins:
glyph@divmod.com wrote:
The other reason is that adbapi is old. If we were going to implement something like ADBAPI today, we'd probably write a process pool first, but adbapi was written as a quick hack to get some database integration a long time ago.
The only caveat is that additional Python interpreters take up more RAM than additional threads. If the database processes are doing any heavy CPU lifting though, this cost could well be worth it.
I'm just getting started with an XMLRPC server that uses adbapi. This satisfies my curiosity as to why it was threaded. Now assuming that, for my application, the "heavy lifting" is done by the DB engine itself, is there any good reason to dig into implementing a process pool? (I'm thinking of the future here, as the server begins to grow more functionality. Right now, my main concern is not to block multiple simultaneous requests from clients.)
* Andreas Kostyrka <andreas@kostyrka.org> [2008-03-07 13:44:02 +0100]:
Actually, on a modern system like Linux, if you fork, the process share memory as long it's not written too (Copy-on-Write).
Except that it's basically impossible to read a Python object without writing to it, as the reference count changes.
This means, that if you'd fork off your process pool from your application, forking should have minimal impact. Even writing it as a standalone process means memory usage of one Python interpreter, plus minimal usage in the forked processes.
This would only be true for a Python implementation that does not use reference counting. -- mithrandi, i Ainil en-Balandor, a faer Ambar
Andreas Kostyrka wrote:
Actually, on a modern system like Linux, if you fork, the process share memory as long it's not written too (Copy-on-Write).
This means, that if you'd fork off your process pool from your application, forking should have minimal impact. Even writing it as a standalone process means memory usage of one Python interpreter, plus minimal usage in the forked processes.
If you fork a python interpreter, you should very quickly replace the process with exec. The reason is that if you de-reference something, Python might deallocate it in a way that causes the parent process to lose it too e.g. send a shut down message on SQL connections.
Mike Pelletier wrote:
On Fri, Mar 7, 2008 at 8:36 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
If you fork a python interpreter, you should very quickly replace the process with exec.
Does fork+exec have any advantages over spawn?
If you mean "os.spawnXX" I think that, under Unix, those *are* fork & exec, so no - they're identical. I don't think there's a native unix syscall "spawn". Seems I recall there is something in the MS VC runtime named similar. Anyway - if we're talking about Twisted, you want to use the Twisted support - reactor.spawnProcess and a subclass of t.i.p.ProcessProtocol to talk to the child worker. reactor.spawnProcess does the right thing(tm)
Phil Mayers wrote:
Mike Pelletier wrote:
On Fri, Mar 7, 2008 at 8:36 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
If you fork a python interpreter, you should very quickly replace the process with exec.
Does fork+exec have any advantages over spawn?
If you mean "os.spawnXX" I think that, under Unix, those *are* fork & exec, so no - they're identical.
Sorry - to reply to myself and try to be more clear: In almost all circumstances, regardless of whether you're using Twisted or not, fork'ing a python interpreter and leaving both parent and child running off the same memory image for any length of time is unwise. In that respect, os.spawnXX == fork/exec and is thus fine. However...
I don't think there's a native unix syscall "spawn".
Seems I recall there is something in the MS VC runtime named similar.
Anyway - if we're talking about Twisted, you want to use the Twisted support - reactor.spawnProcess and a subclass of t.i.p.ProcessProtocol to talk to the child worker.
reactor.spawnProcess does the right thing(tm)
In almost all circumstances when using Twisted, you should use reactor.spawnProcess or one of the util functions t.i.utils.getProcess*
Phil Mayers ha scritto:
Mike Pelletier wrote:
On Fri, Mar 7, 2008 at 8:36 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
If you fork a python interpreter, you should very quickly replace the process with exec.
Does fork+exec have any advantages over spawn?
If you mean "os.spawnXX" I think that, under Unix, those *are* fork & exec, so no - they're identical.
I don't think there's a native unix syscall "spawn".
http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html Note that the page may require registration to be viewed.
[...]
Manlio Perillo
Manlio Perillo wrote:
Phil Mayers ha scritto:
Mike Pelletier wrote:
On Fri, Mar 7, 2008 at 8:36 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
If you fork a python interpreter, you should very quickly replace the process with exec.
Does fork+exec have any advantages over spawn?
If you mean "os.spawnXX" I think that, under Unix, those *are* fork & exec, so no - they're identical.
I don't think there's a native unix syscall "spawn".
http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html
I said "unix" not "posix extenisions for MMU-less realtime/embedded systems"
participants (9)
-
alex clemesha
-
Andreas Kostyrka
-
coder_gus
-
Don Dwiggins
-
glyph@divmod.com
-
Manlio Perillo
-
Mike Pelletier
-
Phil Mayers
-
Tristan Seligmann