[Twisted-Python] Project design questions
Here's a description of my general game plan and my questions are below: THE MISSION Create a GUI-based, dashboard-like application that monitors the overall health of our resources. It would be a tool that helps pinpoint trouble spots when performance is less than desirable. Ideally, we would get such a good handle on it that it would warn us of impending trouble before it causes a failure. SAMPLE FUNCTIONS FOR THE APPLICATION Monitor web traffic by distilling down Apache logs Monitor server hardware through built in firmware http interface Monitor server health via command line commands: disk space processor load etc. Monitor database health via command line commands Monitor backup success through log output Monitor misc. server messages Monitor switch traffic Monitor processes ...And more. POSSIBLE CONSTRAINTS Our main servers are on a WAN whose security is the job of the greater org. We have access through SSL connections now, but we may not be able to get permissions to open any additional ports for our monitoring purposes. THE QUESTIONS Very basic question: am I better off leaving most of the heavy lifting (as in the programming logic) on the client side? This would mean more data would have to pass from server to client app (e.g. a whole web log or part thereof) which is the downside. The upside is that in a failure emergency situation if my server is still up enough to have a SSH connection my dashboard app may still be useful to me. Thoughts? If we are not constrained to only get our info via SSH what's the best kind of connection to have? My thought is to use the Prospective Broker since that seems to be the most Pythonistic option. Is it possible to encrypt the PB server connection? Are there any Twisted methods of tunneling through an existing SSH connection (just a wild thought)? Thanks, Paul
On Mon, 2007-10-01 at 16:36 -0500, Paul_S_Johnson@mnb.uscourts.gov wrote:
Here's a description of my general game plan and my questions are below:
THE MISSION Create a GUI-based, dashboard-like application that monitors the overall health of our resources. It would be a tool that helps pinpoint trouble spots when performance is less than desirable. Ideally, we would get such a good handle on it that it would warn us of impending trouble before it causes a failure.
Wow. This is indeed a mighty mission.
SAMPLE FUNCTIONS FOR THE APPLICATION Monitor web traffic by distilling down Apache logs Monitor server hardware through built in firmware http interface Monitor server health via command line commands: disk space processor load etc. Monitor database health via command line commands Monitor backup success through log output Monitor misc. server messages Monitor switch traffic Monitor processes ...And more.
May I suggest that reviewing existing prior art might serve as a useful place to begin in understanding ways in which this can be done? Examples include, but are far from limited to: - Concord eHealth - BMC Patrol - HP IT/O (or whatever it's called these days) - IBM Tivoli - EMC SMART - Nagios - Zenoss - Hyperic - seafelt (disclosure: I wrote large amounts of this one) - cacti - MRTG
POSSIBLE CONSTRAINTS Our main servers are on a WAN whose security is the job of the greater org. We have access through SSL connections now, but we may not be able to get permissions to open any additional ports for our monitoring purposes.
THE QUESTIONS Very basic question: am I better off leaving most of the heavy lifting (as in the programming logic) on the client side? This would mean more data would have to pass from server to client app (e.g. a whole web log or part thereof) which is the downside. The upside is that in a failure emergency situation if my server is still up enough to have a SSH connection my dashboard app may still be useful to me. Thoughts?
In a production environment, it is usual for people to get nervous about how much of a system's resources will be consumed by the monitoring software (usually called an agent). A webserver's primary function is to serve web traffic, a database server to be a database, etc. so the usual way is to make the client as lightweight as possible and have the heavy lifting done on a dedicated monitoring server (or servers). Then you need to consider if you will need to maintain historical data in order to make decisions. You would need a certain amount of historical data to decide if the rate of storage growth is abnormal, for example. You will also need to decide if you're doing polling based monitoring, where the monitoring system asks clients at a regular interval for some information, or event based monitoring where you simply respond to an event occurring, or some hybrid of the two, where regular polling identifies an event (eg: CPU load too high) which then triggers an event (CPU load too high on client x). Having the client/agent do the polling may be more appropriate in some circumstances, and in others having the server do it might be best.
If we are not constrained to only get our info via SSH what's the best kind of connection to have? My thought is to use the Prospective Broker since that seems to be the most Pythonistic option. Is it possible to encrypt the PB server connection?
It sounds like you're talking about how to get from the GUI to a server, rather than how to talk to a device to interrogate it for information, so yes, Perspective Broker is probably a reasonable way to do that. You won't be able to run Python on a Cisco switch, though, so have you considered something like SNMP for statistics gathering for non-Python devices? What is your actual goal? Do you want to write your own systems monitoring software, or do you want to monitor your kit? Writing this sort of software can become quite a complex undertaking. Have you considered adapting an existing implementation to your own needs by writing the necessary plugin? To stand on the shoulders of giants, as it were. -- Justin Warren <daedalus@eigenmagic.com>
The paper at http://www.cs.princeton.edu/nsg/papers/comon_osr_06/ could be interesting as it describes the architecture of the CoMon monitoring system for the PlanetLab cluster. In particular, it discusses some design decisions that address some of your questions.
participants (3)
-
Justin Warren
-
Paul_S_Johnson@mnb.uscourts.gov
-
Timo Warns