Re: [Twisted-Python] Degrading under load
Sorry, I guess my question wasn't clear enough. The most important things I need to know are: When running listenTCP, how often does twisted accept pending connections on the port? Is it only when the previous connection is finished processing, or every time the event loop gets control, or something in between? And when twisted does accept pending connections, does it accept ALL of them and queue them all for processing, or just one at a time? Thanks, Yitz My original post:
I need to set up a TCP service (on a linux box) that will get something like a few hunderd connections per minute at peak load. For each connection, I do some XML processing, and possibly send a query to another nearby machine and get a respone.
Seems to me that twisted should be able to handle that.
But what happens when I get the occasional burst of connections, lets say tens of connections within one second? What I need is:
o Every client gets a socket connection promptly, so no danger of TCP timeout. o Under medium load, clients will have to wait a bit longer for the response. o Under heavy load, some clients will get a "busy" response (defined in the protocol I am implementing) and immediate socket close.
What is the best way to do that in twisted? I envision one of the following architectures:
A. Just use twisted in the usual way. Watch twisted's event queue for heavy load.
B. Two processes: one to dish out connections and one to queue requests and process them.
C. Three processes: one to dish out connections, one to queue requests and watch for load, and one to process the requests.
Which of these do I need to use to get the desired effect under load? Or is there some better way?
On Thu, 2006-03-09 at 23:13 +0200, Yitzchak Gale wrote:
When running listenTCP, how often does twisted accept pending connections on the port? Is it only when the previous connection is finished processing, or every time the event loop gets control, or something in between?
A TCP connection can live for a long time (e.g. ssh session for hours or days.) The server will therefore accept connections on each iteration of the event loop where the server socket is readable. If processing something when data is received takes a long time though this mean the event loop won't get control back and so accept()ing will be delayed.
And when twisted does accept pending connections, does it accept ALL of them and queue them all for processing, or just one at a time?
IIRC most reactors will try to accept as many as possible, up to some limit in each iteration.
On 3/9/06, Itamar Shtull-Trauring <itamar@itamarst.org> wrote:
On Thu, 2006-03-09 at 23:13 +0200, Yitzchak Gale wrote:
When running listenTCP, how often does twisted accept pending connections on the port?
The server will... accept connections on each iteration of the event loop where the server socket is readable.
OK.
If processing something when data is received takes a long time though this mean the event loop won't get control back and so accept()ing will be delayed.
Right. So in this scenario, performance under load would depend on breaking up the higher-level processing steps into small enough pieces.
And when twisted does accept pending connections, does it accept ALL of them and queue them all for processing, or just one at a time?
IIRC most reactors will try to accept as many as possible, up to some limit in each iteration.
If I was going to do A, I would certainly want to check that carefully. It's not like using select() or poll(). To get a non-blocking accept(), for example, I think you need to set some flag on the listen() call. But right now it looks like I'll do B. It is still quite simple, and looks more robust to me. Thanks, -Yitz
On Fri, 2006-03-10 at 02:12 +0200, Yitzchak Gale wrote:
IIRC most reactors will try to accept as many as possible, up to some limit in each iteration.
If I was going to do A, I would certainly want to check that carefully. It's not like using select() or poll(). To get a non-blocking accept(), for example, I think you need to set some flag on the listen() call.
In this context, "if I recall correctly" is referring to when glyph and I wrote that code :) accept()s are certainly non-blocking, and I just checked the code and it does indeed accept up to a 100, with some dynamic changes based on rate of connections.
Itamar wrote:
IIRC most reactors will try to accept as many as possible, up to some limit in each iteration.
I wrote:
...I would certainly want to check that carefully...
...I just checked the code and it does indeed accept up to a 100, with some dynamic changes based on rate of connections.
Great, thanks! 100 should work for me, at current peak load.
Hi Yitzchak, I can give you some information regarding option A. We are running our chatserver using twisted. When we run twisted as a single process, when the number of connections per second is more than 50 (> 3000 per min), twisted often blocks and does not accept new connections. The CPU load showed by "top" for twisted process is 99.9% This behavior is on linux 2.6 on a 64 bit 4 CPU machine. On 2.4 kernel on a 32 bit 4 CPU machine however, it always accepted new connections, even with 99.9% load, but then they would often time out, since under that load no data was written into them for a long time. This was with Twisted 1.3 I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor). Another observation: we had a memory leak, so when the RSS memory grew say 3x the starting memory, the performance severely degraded. I should note that the machine was not running out of memory: we have 4GB RAM, and the total used memory was at most 400MB, with twistd process using maybe 160MB at the most. We are now moving to Twisted 2.2 and multiprocess architecture, somewhat similar to your B option. -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of Yitzchak Gale Sent: Thursday, March 09, 2006 1:13 PM To: twisted-python@twistedmatrix.com Subject: Re: [Twisted-Python] Degrading under load Sorry, I guess my question wasn't clear enough. The most important things I need to know are: When running listenTCP, how often does twisted accept pending connections on the port? Is it only when the previous connection is finished processing, or every time the event loop gets control, or something in between? And when twisted does accept pending connections, does it accept ALL of them and queue them all for processing, or just one at a time? Thanks, Yitz My original post:
I need to set up a TCP service (on a linux box) that will get something like a few hunderd connections per minute at peak load. For each connection, I do some XML processing, and possibly send a query to another nearby machine and get a respone.
Seems to me that twisted should be able to handle that.
But what happens when I get the occasional burst of connections, lets say tens of connections within one second? What I need is:
o Every client gets a socket connection promptly, so no danger of TCP timeout. o Under medium load, clients will have to wait a bit longer for the response. o Under heavy load, some clients will get a "busy" response (defined in the protocol I am implementing) and immediate socket close.
What is the best way to do that in twisted? I envision one of the following architectures:
A. Just use twisted in the usual way. Watch twisted's event queue for heavy load.
B. Two processes: one to dish out connections and one to queue requests and process them.
C. Three processes: one to dish out connections, one to queue requests and watch for load, and one to process the requests.
Which of these do I need to use to get the desired effect under load? Or is there some better way?
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Thu, 2006-03-09 at 13:49 -0800, Alec (Chatango) wrote:
I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor).
An epoll-based reactor would probably help significantly in this case. Also note that Twisted 2.x had some algorithmic speed improvements over 1.3 and should scale better (though that doesn't help with poll() being a bottleneck).
Itamar Shtull-Trauring wrote:
On Thu, 2006-03-09 at 13:49 -0800, Alec (Chatango) wrote:
I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor).
An epoll-based reactor would probably help significantly in this case. Also note that Twisted 2.x had some algorithmic speed improvements over 1.3 and should scale better (though that doesn't help with poll() being a bottleneck).
Wasn't there some work on a libevent-based reactor at some point? Where's that stand? - Bruce
Is there any stable/tested version of epoll reactor? Where could we get it? I would love to get my hands on it! -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of Itamar Shtull-Trauring Sent: Thursday, March 09, 2006 2:32 PM To: Twisted general discussion Subject: RE: [Twisted-Python] Degrading under load On Thu, 2006-03-09 at 13:49 -0800, Alec (Chatango) wrote:
I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor).
An epoll-based reactor would probably help significantly in this case. Also note that Twisted 2.x had some algorithmic speed improvements over 1.3 and should scale better (though that doesn't help with poll() being a bottleneck). _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
We finally upgraded 1.3 to 2.2 without changing the code. The load shown by 'top' for that twistd process may have dropped form 98-99% to 96-98%, although this is not clear. In any case, the load reduction was marginal, if any. Even though the main source of load is OS poll(), we did expect some improvement from the different scheduler algorithm in 2.2: we have setTimeout() on each new connection and resetTimeout() every 90 sec on each connection when keep-alives from the clients arrive. -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of Itamar Shtull-Trauring Sent: Thursday, March 09, 2006 2:32 PM To: Twisted general discussion Subject: RE: [Twisted-Python] Degrading under load On Thu, 2006-03-09 at 13:49 -0800, Alec Matusis wrote:
I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor).
An epoll-based reactor would probably help significantly in this case. Also note that Twisted 2.x had some algorithmic speed improvements over 1.3 and should scale better (though that doesn't help with poll() being a bottleneck). [Twisted-Python] twisted performance Itamar Shtull-Trauring itamar at itamarst.org Wed Dec 7 10:15:43 MST 2005 Previous message: [Twisted-Python] twisted performance Next message: [Twisted-Python] twisted performance Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] ---------------------------------------------------------------------------- ---- On Wed, 2005-12-07 at 02:04 -0800, Alec Matusis wrote:
I am running Twisted 1.3 server with a fairly large number of clients. The hardware is two 64 bit 3.0 GHz Xeons with HT, 4GB RAM, and it's on 2.6.11 kernel. I am using poll reactor. Currently, when the number of clients approaches 5000, the "top" shows 99% CPU load for twistd process, the event loop slows down and weird racing conditions show up.
Could you try this experiment with Twisted 2.1, ideally with the latest version of Python? There were a number of algorithmic improvements since 1.3 (the one that comes to mind in this case is the scheduler.) One way to discover if the OS-level poll() is the problem is to use oprofile; you should be able to use it to see how much time is spent in there. Before that however, you'd want to use the Python profiler to figure out if there's any obvious hotspots.
On Fri, 2006-03-24 at 22:36 -0800, Alec Matusis wrote:
We finally upgraded 1.3 to 2.2 without changing the code. The load shown by 'top' for that twistd process may have dropped form 98-99% to 96-98%, although this is not clear. In any case, the load reduction was marginal, if any.
Even though the main source of load is OS poll(), we did expect some improvement from the different scheduler algorithm in 2.2: we have setTimeout() on each new connection and resetTimeout() every 90 sec on each connection when keep-alives from the clients arrive.
Looks like you want epoll().
Would it be possible to install a Twisted application using GTK on a USB stick for Windows platforms?
If you can write the program, you can probably store it on a USB stick. (what size are we talking? 16MB? 32MB? 64MB? 512MB? 1GB? 2GB?) On 3/25/06, Steve Slevinski <slevin@signpuddle.net> wrote:
Would it be possible to install a Twisted application using GTK on a USB stick for Windows platforms?
-- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/
What every size would be needed. If I could install python, GTK, Twisted all on a USB stick, it would be easy to give my end users a working install without requiring and configuration on their part. Christopher Armstrong wrote:
If you can write the program, you can probably store it on a USB stick. (what size are we talking? 16MB? 32MB? 64MB? 512MB? 1GB? 2GB?)
On 3/25/06, *Steve Slevinski * <slevin@signpuddle.net <mailto:slevin@signpuddle.net>> wrote:
Would it be possible to install a Twisted application using GTK on a USB stick for Windows platforms?
-- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/ ------------------------------------------------------------------------
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
------------------------------------------------------------------------
No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.385 / Virus Database: 268.3.1/292 - Release Date: 3/24/2006
Steve Slevinski wrote:
Christopher Armstrong wrote:
If you can write the program, you can probably store it on a USB stick. (what size are we talking? 16MB? 32MB? 64MB? 512MB? 1GB? 2GB?)
What every size would be needed. If I could install python, GTK, Twisted all on a USB stick, it would be easy to give my end users a working install without requiring and configuration on their part.
From the sounds of it, you probably want something like py2app (http://undefined.org/python/py2app.html, for Mac OS X deployment) and/or py2exe (http://www.py2exe.org/). They do a good job of bundling up Python programs with the Python runtime and any dependancies for distribution.
I've used py2exe with Twisted and it works great--haven't tried it with gtk, but it works with wxPython. That is, assuming you're deploying on Win32 or Mac OSX. Bundling Python for Linux, I got nothin'. -sebastian
On 3/26/06, Sebastian Hanlon <sebastian.hanlon@uleth.ca> wrote:
Steve Slevinski wrote:
From the sounds of it, you probably want something like py2app (http://undefined.org/python/py2app.html, for Mac OS X deployment) and/or py2exe (http://www.py2exe.org/). They do a good job of bundling up Python programs with the Python runtime and any dependancies for distribution.
You might also want to take a peek at movable python: <http://www.voidspace.org.uk/python/movpy/index.html> It is more wxpython-based than gtk-based, but there's nothing to stop you installing pyGTK onto it. Moof
Not only is this possible, it's fairly easy to do with py2exe, and I've done it on a couple of different apps. I'm working on an app right now with almost exactly these requirements. (Originally, it was even going to be stored on a usb stick, but now it just has to be lightly downloadable, and it is.) It uses Axiom, which means it pulls in substantial portions of Divmod's software, Twisted, zope.interface, and god knows what else. Like your app, it uses GTK, which means it pulls in pygtk and the whole GTK runtime. All this comes to 18M, including my app. The compressed installer version is 8MB. Most of the solution is here: <http://starship.python.net/crew/theller/moin.cgi/Py2exeAndPyGTK>. You don't have to do anything special for Twisted and zope.interface, nor for Divmod stuff if that interests you; py2exe these days seems to handle all that seamlessly. C Steve Slevinski wrote:
What every size would be needed. If I could install python, GTK, Twisted all on a USB stick, it would be easy to give my end users a working install without requiring and configuration on their part.
Christopher Armstrong wrote:
If you can write the program, you can probably store it on a USB stick. (what size are we talking? 16MB? 32MB? 64MB? 512MB? 1GB? 2GB?)
On 3/25/06, *Steve Slevinski * <slevin@signpuddle.net <mailto:slevin@signpuddle.net>> wrote:
Would it be possible to install a Twisted application using GTK on a USB stick for Windows platforms?
-- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/ ------------------------------------------------------------------------
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
------------------------------------------------------------------------
No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.385 / Virus Database: 268.3.1/292 - Release Date: 3/24/2006
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Alec, Thanks for the great information! Our situation is a bit different in that we will not be keeping connections open for more than a few seconds at a time (I hope). So even at peak expected load, we shouldn't ever have as many as 100 sockets open at any given time. Also, we will be load balancing at least two boxes, for hardware redundancy. So we could always throw in an extra box or two if things heat up. But I think I'll go with B anyway. It is neater in that it separates performance under load from the architechture of the high-level processing. With A, we will always have to worry about dividing the XML stuff into small enough pieces to let the event loop in often enough. Your platform comparisons will be very helpful. -Yitz On 3/9/06, Alec (Chatango) <alecm@chatango.com> wrote:
Hi Yitzchak,
I can give you some information regarding option A. We are running our chatserver using twisted. When we run twisted as a single process, when the number of connections per second is more than 50 (> 3000 per min), twisted often blocks and does not accept new connections. The CPU load showed by "top" for twisted process is 99.9% This behavior is on linux 2.6 on a 64 bit 4 CPU machine. On 2.4 kernel on a 32 bit 4 CPU machine however, it always accepted new connections, even with 99.9% load, but then they would often time out, since under that load no data was written into them for a long time. This was with Twisted 1.3
I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor).
Another observation: we had a memory leak, so when the RSS memory grew say 3x the starting memory, the performance severely degraded. I should note that the machine was not running out of memory: we have 4GB RAM, and the total used memory was at most 400MB, with twistd process using maybe 160MB at the most.
We are now moving to Twisted 2.2 and multiprocess architecture, somewhat similar to your B option.
-----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of Yitzchak Gale Sent: Thursday, March 09, 2006 1:13 PM To: twisted-python@twistedmatrix.com Subject: Re: [Twisted-Python] Degrading under load
Sorry, I guess my question wasn't clear enough.
The most important things I need to know are:
When running listenTCP, how often does twisted accept pending connections on the port? Is it only when the previous connection is finished processing, or every time the event loop gets control, or something in between?
And when twisted does accept pending connections, does it accept ALL of them and queue them all for processing, or just one at a time?
Thanks, Yitz
My original post:
I need to set up a TCP service (on a linux box) that will get something like a few hunderd connections per minute at peak load. For each connection, I do some XML processing, and possibly send a query to another nearby machine and get a respone.
Seems to me that twisted should be able to handle that.
But what happens when I get the occasional burst of connections, lets say tens of connections within one second? What I need is:
o Every client gets a socket connection promptly, so no danger of TCP timeout. o Under medium load, clients will have to wait a bit longer for the response. o Under heavy load, some clients will get a "busy" response (defined in the protocol I am implementing) and immediate socket close.
What is the best way to do that in twisted? I envision one of the following architectures:
A. Just use twisted in the usual way. Watch twisted's event queue for heavy load.
B. Two processes: one to dish out connections and one to queue requests and process them.
C. Three processes: one to dish out connections, one to queue requests and watch for load, and one to process the requests.
Which of these do I need to use to get the desired effect under load? Or is there some better way?
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Fri, 2006-03-10 at 02:02 +0200, Yitzchak Gale wrote:
But I think I'll go with B anyway. It is neater in that it separates performance under load from the architechture of the high-level processing. With A, we will always have to worry about dividing the XML stuff into small enough pieces to let the event loop in often enough.
You could also use a thread pool for the XML processing. This will certainly help make the event loop stay more responsive without breaking up work manually. However, if the heavy lifting in the code is pure Python you won't be able to take advantage of multiple CPUs, because of the global interpreter lock. Processes also have the benefit that you can dispatch them to multiple machines.
Itamar Shtull-Trauring wrote:
Yitzchak Gale wrote:
But I think I'll go with B [two processes] anyway. It is neater in that it separates performance under load from the architechture of the high-level processing. With A [simple twisted], we will always have to worry about dividing the XML stuff into small enough pieces to let the event loop in often enough.
You could also use a thread pool for the XML processing. This will certainly help make the event loop stay more responsive without breaking up work manually.
I am afraid that a thread pool will not play nicely with our current monitoring framework.
However, if the heavy lifting in the code is pure Python
It probably will be.
you won't be able to take advantage of multiple CPUs, because of the global interpreter lock.
OK, that is also an issue.
Processes also have the benefit that you can dispatch them to multiple machines.
Good. Thanks, Yitz
Alec wrote:
I should add here that in our case, the load was driven not by connection/disconnection events, but by the number of established connections. When that number was in the vicinity of 5000, system poll() became very slow (we run poll reactor).
Wait a minute, how do the OS networking layers scale with that many open connections? Are you sure this is a twisted problem? Maybe you would be better off with several cheaper boxes rather than one expensive one. (I realize that for a chat server, that rather complicates things, but it may be worth it.) -Yitz
participants (11)
-
Alec (Chatango) -
Alec Matusis -
Bruce Mitchener -
Christopher Armstrong -
Cory Dodt -
Itamar Shtull-Trauring -
mcmillen@cs.cmu.edu -
Moof -
Sebastian Hanlon -
Steve Slevinski -
Yitzchak Gale