[Twisted-Python] Twisted server is 5 times SLOWER on Solaris than Linux?
I we have a basic twisted server that we have doing nothing but dropping received bits on the floor, and sending back a static response. We have tested it on a Solaris 10 on a Sun T1000 ( 8 cores ) and on Solaris 8 on a Sun V210. We did initial load testing on a Dell 2850 ( 2 Hyperthreaded P4's ) using Red Hat WS 3 (Taroon Update 4) / ( Linux 2.4.21-27.ELsmp ) The Solaris boxes are all showing latency 5 times what we are seeing on the Dell 2850 with 1/5 as many clients to boot. On the Dell 2850 ( linux thinks it is a quad processor box ) our Twisted server is more Network I/O bound than CPU bound until we get to around 800 - 900 clients, latency is manageable all the way up to about 1400 clients, at which time they start getting progressively worse. All the Sparc boxes start out CPU bound at about 200 clients and latencies are way worse than Linux with 1400 clients. The only Solaris 10 x86 box we have to test on is so old it is almost impossible to compare to even the Dell 2850. :-( We are using Python 2.4.4 and Twisted 2.4.0 using the poll reactor in all cases. Anyone have any idea why we are seeing such poor performance on Solaris? Is it the Sparc hardware?
On 1/17/07, Jarrod Roberson <jarrod@vertigrated.com> wrote:
I we have a basic twisted server that we have doing nothing but dropping received bits on the floor, and sending back a static response.
Post your code somewhere. Perhaps you are doing something horribly wrong. Anyone have any idea why we are seeing such poor performance on Solaris?
Is it the Sparc hardware?
Sparc processors aren't exactly well known for their blazing speed. Try a trivial CPU benchmark like hashcash. The Sparc box might be better at concurrency, but your Twisted benchmark is probably not using multiple CPUs. Also, Twisted 2.5 includes an epoll reactor. It should scale considerably better than poll, but you need a 2.6 Linux kernel for it.
On Wed, 17 Jan 2007 17:18:13 -0500, Jarrod Roberson <jarrod@vertigrated.com> wrote:
I we have a basic twisted server that we have doing nothing but dropping received bits on the floor, and sending back a static response.
We have tested it on a Solaris 10 on a Sun T1000 ( 8 cores ) and on Solaris 8 on a Sun V210. We did initial load testing on a Dell 2850 ( 2 Hyperthreaded P4's ) using Red Hat WS 3 (Taroon Update 4) / ( Linux 2.4.21-27.ELsmp )
The Solaris boxes are all showing latency 5 times what we are seeing on the Dell 2850 with 1/5 as many clients to boot.
On the Dell 2850 ( linux thinks it is a quad processor box ) our Twisted server is more Network I/O bound than CPU bound until we get to around 800 - 900 clients, latency is manageable all the way up to about 1400 clients, at which time they start getting progressively worse.
All the Sparc boxes start out CPU bound at about 200 clients and latencies are way worse than Linux with 1400 clients.
The only Solaris 10 x86 box we have to test on is so old it is almost impossible to compare to even the Dell 2850. :-(
We are using Python 2.4.4 and Twisted 2.4.0 using the poll reactor in all cases.
Anyone have any idea why we are seeing such poor performance on Solaris? Is it the Sparc hardware?
I don't have any specific experience with Twisted on Sparc hardware, but one thing I'll point out is that Twisted is single threaded. Whether there are 4 cores or 8 probably won't make any noticable difference to a Twisted application's performance. It looks like a single core on a T1000 is much less powerful than a single core in a Dell 2850, so this might account for some of the difference. Jean-Paul
On 1/17/07, Jean-Paul Calderone <exarkun@divmod.com> wrote:
I don't have any specific experience with Twisted on Sparc hardware, but one thing I'll point out is that Twisted is single threaded. Whether there are 4 cores or 8 probably won't make any noticable difference to a Twisted application's performance.
It looks like a single core on a T1000 is much less powerful than a single core in a Dell 2850, so this might account for some of the difference.
Jean-Paul
This is the conclusion we are coming to as well. We found a Dual Dual Core Opteron box with Solaris 10 x86 on it. The Opteron box was even faster than the Dual P4 box. So we have pretty much ruled out Solaris as the problem, everything points to sucky performance of the Sparc hardware.
On 1/17/07, Jean-Paul Calderone <exarkun@divmod.com> wrote:
It looks like a single core on a T1000 is much less powerful than a single core in a Dell 2850, so this might account for some of the difference.
That is precisely the case. A quote Obviously, the UltraSparc T1 will perform quite poorly on workloads that require single-threaded performance. For those types of non-multithreaded workloads, Sun will rely for the time being on its Opteron-powered Galaxy server line. In 2008, however, Sun plans to release a new design codenamed "Rock" with better single-threaded performance. taken from http://arstechnica.com/news.ars/post/20051114-5569.html You can find a bit more info here: http://arstechnica.com/news.ars/post/20051116-5584.htm This Sun hardware is built /specifically/ for running /highly/ multithreaded code, it has always sucked royally at single-threaded performance, this was very much a design decision. Cheers, f
On 1/17/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/17/07, Jean-Paul Calderone <exarkun@divmod.com> wrote:
It looks like a single core on a T1000 is much less powerful than a single core in a Dell 2850, so this might account for some of the difference.
That is precisely the case. A quote
Obviously, the UltraSparc T1 will perform quite poorly on workloads that require single-threaded performance. For those types of non-multithreaded workloads, Sun will rely for the time being on its Opteron-powered Galaxy server line. In 2008, however, Sun plans to release a new design codenamed "Rock" with better single-threaded performance.
taken from http://arstechnica.com/news.ars/post/20051114-5569.html
You can find a bit more info here:
http://arstechnica.com/news.ars/post/20051116-5584.htm
This Sun hardware is built /specifically/ for running /highly/ multithreaded code, it has always sucked royally at single-threaded performance, this was very much a design decision.
There is a "backend" C module that our Twisted server front ends, and it is highly multi-threaded. So the T1000 is PERFECT for our application, except that now Twisted is the bottleneck. :-( Unfortunately we have a 11th hour constraint of a vendor library that we are required to use, it is only available on Sparc Solaris. So we either scrap our Twisted implementation and have to spend extra time on another network handling layer, or run 5 times as many instances of our server to service the same number of concurrent clients.
On 1/17/07, Jarrod Roberson <jarrod@vertigrated.com> wrote:
There is a "backend" C module that our Twisted server front ends, and it is highly multi-threaded. So the T1000 is PERFECT for our application, except that now Twisted is the bottleneck. :-(
Unfortunately we have a 11th hour constraint of a vendor library that we are required to use, it is only available on Sparc Solaris.
Bummer. It sounds like you have a slightly toxic combination of constraints between the software (highly MT backend + single-threaded Twisted) and your hardware (T1000, tuned for MT code). Unpleasant... Cheers, f
participants (4)
-
Fernando Perez
-
Jarrod Roberson
-
Jean-Paul Calderone
-
Pavel Pergamenshchik