[Twisted-Python] When would you considering split a server application to some physical instance with different logic function?
Hi all, First, I am sorry because it's not a question related to twisted. But since twisted is a networking programming framework, you might be interested. Some of my colleagues have a view that a net work server should be split to small parts with different functions, either with processes communicating with socket, or with threads talking on message queues. Their points are these: 1. Each part could it's own state-machine, thus we could split big state-machines to decoupled small ones. 2. It makes a pipe line that could handle some requests at the same time without introducing lock problem of threading. Just like how CPU use pipe line to improve performance. 3. It resembles a micro-kernel system which is supposed to have better availability. As far as I know, some networking device manufactures use this model to implement their routers or switches. but I have never heard any examples besides that. As twisted book says, most networking application use one of these 3 modes: 1. handle each connection in a separate operating system process, in which case the operating system will take care of letting other processes run while one is waiting; 2. handle each connection in a separate thread1 in which the threading framework takes care of letting other threads run while one is waiting; or 3. use non-blocking system calls to handle all connections in one thread. (Like twisted or lib-event or just select) After considering for a while, I thought there are some faults in the multiple parts model: 1. Writing code to handle message is much more tedious than just doing function calls 2. It's not very easy to make a pipe line works fine. OK, hope I made myself clear. Since most of you who are reading this mail list , are experienced networking programmers, I think you might give some insight comments.
On Monday 01 December 2008, Peter Cai wrote:
As far as I know, some networking device manufactures use this model to implement their routers or switches. but I have never heard any examples besides that.
The Postfix mail server uses an architecture of many small processes working together: http://www.postfix.org/OVERVIEW.html One of the big motivations for this architecture is security: each process only needs to run with the minimum privileges it needs to do its work. Just because SMTP has to listen to port 25 does not mean the entire mail server has to run as root.
As twisted book says, most networking application use one of these 3 modes:
1. handle each connection in a separate operating system process, in which case the operating system will take care of letting other processes run while one is waiting;
2. handle each connection in a separate thread1 in which the threading framework takes care of letting other threads run while one is waiting; or
3. use non-blocking system calls to handle all connections in one thread. (Like twisted or lib-event or just select)
This describes how multiple connections can be handled (for example Apache has several different approaches for this), which is a different issue from spreading functionality over different processes (like Postfix does). Where do you expect the bottleneck to be in your application? Does it have to serve a large number of clients? Does it have to send or receive large amounts of data? Does it have to do heavy computations? Does it have to get data to or from external servers like a DB server or a web service?
After considering for a while, I thought there are some faults in the multiple parts model:
1. Writing code to handle message is much more tedious than just doing function calls 2. It's not very easy to make a pipe line works fine.
It depends a lot on what you are trying to build and where and how you make the splits between differents parts of your application. For example a function call is simple in the case there is only one thread and the called operation will not block. If it does block, you need to add a callback mechanism like Twisted's Deferred. Or if there are multiple threads, you have to be very careful about which functions you are allowed to call while your thread is holding one or more locks. So a function call is not so simple anymore when it's part of a complex application... There are libraries that make communication over a pipe more friendly, such as Perspective Broker or Foolscap. This does not mean you can replace any arbitrary function call by a remote method call, but it does mean you can skip writing (de)serialization code again and again. No matter which model you choose, dividing your application into communicating blocks is a good idea: it is absolutely necessary when using multiple processes, if avoids a lot of locking issues when using threads and it makes your application easier to test in all three models. Bye, Maarten
participants (2)
-
Maarten ter Huurne
-
Peter Cai