Script to Monitor Nagios Logs - Detect Nagios Daemon Failure and restart

This script is to monitor a failure when Nagios daemons fails to start or sometimes Nagios stops sending alerts. When you check the logs at "/usr/local/nagios/var/log/nagios.log" you might come across messages like "Caught SIGSEGV, shutting down". These messages need to be monitored and then fixed so that Nagios works in a proper fashion. This would be monitored via Cronjob on the host as Nagios itself won't be able to detect the failure.

I have written a script in BASH which have been working successfully. Please feel free to explore other options around the script and suggestion are always welcomed.

Nagios Daemon Check Script

#!/bin/bash

######################################  VARIABLES ############################################################
NAGIOS_LOG=`cat /usr/local/nagios/var/log/nagios.log | perl -pe 's/(\d+)/localtime($1)/e' | grep Caught | awk '{print $2" "$3" "$4" "$6" "$7" "$8" "$9$10}' > /usr/local/nagios/var/log/tmp_log`

NAGIOS_LOG_COUNT=`awk -v d1="$(date --date="-60 min" "+%b %_d %H:%M")" -v d2="$(date "+%b %_d %H:%M")" '$0 > d1 && $0 < d2 || $0 ~ d2' /usr/local/nagios/var/log/tmp_log | wc -l`
SERVICE_NAG_COUNT=`/etc/init.d/nagios status | grep running | wc -l` ####################################### DEC END ##############################################################
if [ $NAGIOS_LOG_COUNT == 0 ]; then echo "Nagios is running OK" elif [ $NAGIOS_LOG_COUNT -ge 1 ]; then echo "Nagios Service Outage" >> /usr/local/nagios/var/nagios_service_check_log echo "=====================" >> /usr/local/nagios/var/nagios_service_check_log echo "$NAGIOS_LOG" >> /usr/local/nagios/var/nagios_service_check_log echo "## Restarting Nagios Service ##" >> /usr/local/nagios/var/nagios_service_check_log /etc/init.d/nagios restart >> /usr/local/nagios/var/nagios_service_check_log sleep 2 if [ $SERVICE_NAG_COUNT == 1 ]; then ############# VARIABLE ############################### SERVICE_NAG=`/etc/init.d/nagios status | grep running` ###################################################### echo "OK - $SERVICE_NAG" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "NOTIFICATION - Nagios Service Outage" admin@howtovmlinux.com < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log else echo "CRITICAL - Nagios Service restart failed" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "CRITICAL - Nagios Service Outage - Escalation Needed" admin@howtovmlinux.com < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log fi fi

How does the above script work

  • It starts off with grep of the nagios log and also converting the UNIX timestamp in the Human readable format and then format the output of the result to "Month dd hh:mm:ss" so that it can be grepped for a specfic time period (NAGIOS_LOG Line).
  • Then we grep the log file for the error for the last hour and if no error then echo "Nagios is Running OK" or else if error occured more than once then tell the script to restart the Nagios Daemon and send out the notification that the error occured and has been fixed. You can send the email to maybe your support department or the people who are responsible for Nagios monitoring.

I hope this script has helped you and please share it and feedback is always welcome for improvements.

Most Read Articles

Add/Detect a new disk in ...

Written By Farooq Mohammed Ahmed on Sunday, 11 January 2015 19:22
Add/Detect a new disk in ...

Replace SSL Certificates ...

Written By Farooq Mohammed Ahmed on Friday, 16 December 2016 08:05
Replace SSL Certificates ...

Using awk in Alias Comman...

Written By Farooq Mohammed Ahmed on Friday, 19 February 2016 14:22
Using awk in Alias Comman...

Sed - Insert Text before ...

Written By Farooq Mohammed Ahmed on Monday, 20 November 2017 00:58
Sed - Insert Text before ...

Script to Monitor Nagios ...

Written By Farooq Mohammed Ahmed on Monday, 18 July 2016 22:48
Script to Monitor Nagios ...