Server Monitoring Script

Yesterday I got an email from Rimuhosting warning me about too-high load on my server.  It turned out that there was a runaway process (aftermath of executing a bugged CGI script) that was consuming 90%+ of the CPU resources on my VPS.

Of course, I immediately killed the process, which ended this issue. 

However, I was left  thinking that I really need a way to monitor server load, and notify me via email if the average load over one minute goes over, say, 70%.  I want this to be checked every 15 minutes by a process executed by cron, but I want only at most one email warning every hour.   That warning  email should list each incident of going over the 70% load level, with the load level and time indicated.

Here's a bash script to do this.  The frequency of the emails, the load level trigger level, and the sampling rate (load level over one minute, over 5 minutes, or over 10 minutes) are all configurable.  This is a very lightweight solution for those who only need to monitor load level.

#!/bin/bash
# server load monitoring script by Lloyd Standish, lloyd at crnatural.net
# This script is freeware released under GNU GPL.

loadpath="/home/lloyd/loadmon"
maxload=.70 # example .75 = 75% load average
minwarninterval=3600 # minimum interval between warning emails, seconds

# uncomment only one of the folowing 3 lines
loadavginterval=1 #for one minute load averages
#loadavginterval=2 #for 5 minute load averages
#loadavginterval=3 #for 10 minute load averages

#cat /proc/loadavg #debugging

if [ ! -d "$loadpath" ]; then
	mkdir "$loadpath"
fi
#if [ ! -f "$loadpath/loadcount" ]; then
#	echo "0" "$loadpath/loadcount"
#fi

#count=`cat $loadpath/loadcount`
now=`date +%s`
prev="0"
if [ -f "$loadpath/loadsecs" ]; then
	prev=`cat $loadpath/loadsecs`
fi
# check if average load is too high
loadavg=`cat /proc/loadavg | cut -d ' ' -f $loadavginterval`
if [ `echo $loadavg \> $maxload | bc` -eq 1 ]; then
	echo "$loadavg `date +%T`" >> "$loadpath/loadrpt"
fi

if [ `echo $now \> $prev \+ $minwarninterval | bc` -eq 1 -a -f "$loadpath/loadrpt" ]; then
	case $loadavginterval in
		1) loadminutes="one";;
		2) loadminutes="five";;
		3) loadminutes="ten";;
	esac
	echo "Server `hostname`: Warning, $loadminutes minute load average above $maxload! Incidents in last $minwarninterval seconds:" | cat - "$loadpath/loadrpt"
	rm -f "$loadpath/loadrpt"
	echo $now > "$loadpath/loadsecs"
fi

This is installed with a line in /etc/crontab like this:

*/15 * * * * lloyd /home/lloyd/load.sh

Notes:

1. bc is a calculator program that can take arguments from STDIN.  You may need to install this on your server (Debian: apt-get install bc)

2. /proc/loadavg returns (example output):

0.20 0.18 0.12 1/80 11206
 
The first three columns measure CPU and IO utilization of the last one, five, and 10 minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used.

3. Consult script source for configuration options.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br> <img> <alt> <h1> <h2> <h3>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is to prevent automated spam submissions.
  _  _                                 __   __  __ 
| || | _ __ __ __ _ _ / _| | \/ |
| || |_ | '_ \ \ \ /\ / / | | | | | |_ | |\/| |
|__ _| | | | | \ V V / | |_| | | _| | | | |
|_| |_| |_| \_/\_/ \__, | |_| |_| |_|
|___/
Enter the code depicted in ASCII art style.