
hi, what would be the right thing to start from in order to build multi-reactor arch to handle thousands of concurrent connections? Appreciate the help.

Why would you want multiple reactors? The only reason would be to have one per CPU core. For that the simplest thing would be to manually start n twistd processes, one per core and have a reverse proxy process listening on a port and distributing connections to each twistd process. Of course, you can extend this architecture to multiple machines. Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

This is a classic distributed systems architecture. A reverse proxy can either something like haproxy, nginx, apache, perlbal or whatever (even another twisted process). The twistd processes can be seen as simply other machines on a LAN - instead they all have the 127.0.0.1:<port> where the port is different for each process. I don't have a more comprehensive example, but there are many many examples of reverse proxying servers all over the place - google "nginx reverse proxy" for many examples. Hope that helps. Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

You right, distributed systems architecture. Probably I need to rephrase my question more precisely: how to build distributed system architecture with Twisted technology only ? As you mentioned, even reverse proxy could be a another twisted process. Quoting "Reza Lotun" <rlotun@gmail.com>:

On Thu, Nov 12, 2009 at 12:53 PM, <vitaly@synapticvision.com> wrote:
You might look at txLoadBalancer: https://launchpad.net/txloadbalancer Kevin Horn

On 03:59 pm, vitaly@synapticvision.com wrote:
Doesn't the event loop have a limit of connections it could handle?
Multiple reactors isn't a realistic solution to this. The solution is to switch to an event loop that has a higher limit. "The" event loop is actually a choice of many possible event loops. So connection limits aren't a good reason to want multiple reactors. Jean-Paul

I've get confused enough already :-)) Once there is a Site that serving many clients and reactor.listenSSL(), for example, that actually serving many TCP connections and all these going through TwistedGateway, my logic, please correct me if I wrong, says at some point there will be a limit on concurrent TCP connections, so how is it solved with Twisted? Quoting exarkun@twistedmatrix.com:

That's a good question and I'm not sure if there's a definite answer to this (as far as I know). I think it depends on your application - for example, if your server is performing a big computation then on average client connections will last longer, meaning you'll have more concurrent connections. The best way to determine this is to *measure* it - for example, you can do a load test with httperf and ramp up connections until this start to break or become unresponsive. You can mitigate the situation by tuning your platform a bit (assuming you're using linux) - use the epoll reactor, which is high performance - make sure the number of open file descriptors is set to something high (and *not* 1024) - see `ulimit -a` - make sure you tune your tcp settings - see /etc/sysctl.conf, namely fs.file-max and various net.ipv4 settings (google is your friend on the best settings, coupled with testing) Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

thank you for such detailed response. I feel, finally I've succeed to express my original question correctly. So if I go one step forward, and lets assume that indeed there is such limit of concurrent connections, THAN: should it be resolved by another architecture or another usage type of Twisted technology or something else? Quoting "Reza Lotun" <rlotun@gmail.com>:

Again, I don't think there are any universal answers to this question. It depends on what you're building. For example, say it's a REST api, which by design is stateless (i.e. no sessions). Then you can stick a load balancer in front (if you're on EC2 amazon has an "elastic load balancer" service for this) and load balance amongst many machines. As you find traffic increases you simply add more machines. This is called "horizontal scalability" and, as you might imagine, its highly desirable. Another form is "vertical scalability" - that involves getting a faster computer to run your server on. This might work for some cases, but not in general - it seems to be the method applied to scaling RDBMSs, before going down the road of master/slave setups, sharding and denormalization. Of course, you *could* use a different technology entirely when you need to scale really high. This might make sense if your'e a small company and growing - say you start out as a small team, and you need something up quickly that's fairly decent. You happen to know python so you roll the whole thing out in Twisted. As time progresses, you may rewrite certain systems in, say, erlang or something and move forward. So, it's hard to say, really. At least, I'd like to know myself ;-) That's what makes the wheel field so interesting - there's a certain creative element to scalable systems. Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

So if I get stick to the "vertical scalability"(Site has sessions), is it gonna be helpful for performance to run Twisted reactor on a single core machine vs multi-core machine (after all Python itself has a Global Interpreter Lock)? OR the entire "TwsitedGateway+listenSSL+Site+reactor" USAGE should be re-designed for the project? What about 64bit machine influence on Twisted? Quoting "Reza Lotun" <rlotun@gmail.com>:

Why would you want multiple reactors? The only reason would be to have one per CPU core. For that the simplest thing would be to manually start n twistd processes, one per core and have a reverse proxy process listening on a port and distributing connections to each twistd process. Of course, you can extend this architecture to multiple machines. Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

This is a classic distributed systems architecture. A reverse proxy can either something like haproxy, nginx, apache, perlbal or whatever (even another twisted process). The twistd processes can be seen as simply other machines on a LAN - instead they all have the 127.0.0.1:<port> where the port is different for each process. I don't have a more comprehensive example, but there are many many examples of reverse proxying servers all over the place - google "nginx reverse proxy" for many examples. Hope that helps. Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

You right, distributed systems architecture. Probably I need to rephrase my question more precisely: how to build distributed system architecture with Twisted technology only ? As you mentioned, even reverse proxy could be a another twisted process. Quoting "Reza Lotun" <rlotun@gmail.com>:

On Thu, Nov 12, 2009 at 12:53 PM, <vitaly@synapticvision.com> wrote:
You might look at txLoadBalancer: https://launchpad.net/txloadbalancer Kevin Horn

On 03:59 pm, vitaly@synapticvision.com wrote:
Doesn't the event loop have a limit of connections it could handle?
Multiple reactors isn't a realistic solution to this. The solution is to switch to an event loop that has a higher limit. "The" event loop is actually a choice of many possible event loops. So connection limits aren't a good reason to want multiple reactors. Jean-Paul

I've get confused enough already :-)) Once there is a Site that serving many clients and reactor.listenSSL(), for example, that actually serving many TCP connections and all these going through TwistedGateway, my logic, please correct me if I wrong, says at some point there will be a limit on concurrent TCP connections, so how is it solved with Twisted? Quoting exarkun@twistedmatrix.com:

That's a good question and I'm not sure if there's a definite answer to this (as far as I know). I think it depends on your application - for example, if your server is performing a big computation then on average client connections will last longer, meaning you'll have more concurrent connections. The best way to determine this is to *measure* it - for example, you can do a load test with httperf and ramp up connections until this start to break or become unresponsive. You can mitigate the situation by tuning your platform a bit (assuming you're using linux) - use the epoll reactor, which is high performance - make sure the number of open file descriptors is set to something high (and *not* 1024) - see `ulimit -a` - make sure you tune your tcp settings - see /etc/sysctl.conf, namely fs.file-max and various net.ipv4 settings (google is your friend on the best settings, coupled with testing) Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

thank you for such detailed response. I feel, finally I've succeed to express my original question correctly. So if I go one step forward, and lets assume that indeed there is such limit of concurrent connections, THAN: should it be resolved by another architecture or another usage type of Twisted technology or something else? Quoting "Reza Lotun" <rlotun@gmail.com>:

Again, I don't think there are any universal answers to this question. It depends on what you're building. For example, say it's a REST api, which by design is stateless (i.e. no sessions). Then you can stick a load balancer in front (if you're on EC2 amazon has an "elastic load balancer" service for this) and load balance amongst many machines. As you find traffic increases you simply add more machines. This is called "horizontal scalability" and, as you might imagine, its highly desirable. Another form is "vertical scalability" - that involves getting a faster computer to run your server on. This might work for some cases, but not in general - it seems to be the method applied to scaling RDBMSs, before going down the road of master/slave setups, sharding and denormalization. Of course, you *could* use a different technology entirely when you need to scale really high. This might make sense if your'e a small company and growing - say you start out as a small team, and you need something up quickly that's fairly decent. You happen to know python so you roll the whole thing out in Twisted. As time progresses, you may rewrite certain systems in, say, erlang or something and move forward. So, it's hard to say, really. At least, I'd like to know myself ;-) That's what makes the wheel field so interesting - there's a certain creative element to scalable systems. Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun

So if I get stick to the "vertical scalability"(Site has sessions), is it gonna be helpful for performance to run Twisted reactor on a single core machine vs multi-core machine (after all Python itself has a Global Interpreter Lock)? OR the entire "TwsitedGateway+listenSSL+Site+reactor" USAGE should be re-designed for the project? What about 64bit machine influence on Twisted? Quoting "Reza Lotun" <rlotun@gmail.com>:
participants (4)
-
exarkun@twistedmatrix.com
-
Kevin Horn
-
Reza Lotun
-
vitaly@synapticvision.com