Dead simple CentOS server monitoring with Monit and Pushover

150532

My company Nucleics has an array of servers distributed around the world to support our PeakTrace Basecaller. For historical reasons these servers are a mix of CentOS 6/7 VPS and physical servers supplied by three different companies. While the Auto PeakTrace RP application is designed to be robust in the face of server downtime, I wanted a dead simple monitoring service that would fix 99% of the server problem automatically and only contact me if there was something really wrong. After looking around all the paid services I settled on using a combination of Monit and Pushover.

Monit is an open source watchdog utility that can monitor other Linux services and automatically restart them if they crash or stop working. The great thing about monit is that you can set it up to fix things on its own. For example, if the server can be fixed by simply restarting apache then I want the monitoring service to just do this and only send me a message if something major has happened. I also wanted a service that would ping my phone, but where I could easily control it (i.e turn on/off, set away times, etc).

Pushover looked ideal for doing this. For a one off cost of $5 you can use the Pushover API to send up to 7500 message a month to any phone. It has lots of other nice features like quiet times and group notification. It comes with a 7 day free trial so you have time to make sure everything is going to work with your system before paying.

The only issue with integrating monit and pushover is that by default monit is set to email alert notices. Most of our servers don’t have the ability to email (they are slimmed down and are only running the services needs to support PeakTrace). Luckly, monit can also execute scripts so I settled on the alternative approach of calling the Pushover API via an alert script that would pass through exactly what server and service was having problems. This alert script is set to only be called if monit cannot fix the problem by restarting the service. After a bit of experimentation I got the whole system running rather nicely.

Here is the step-by-step guide. I did all this logged in as root, but if you don’t like to live on the edge just put sudo in front of every command.

Setting up Pushover

After registering an account at Pushover, and downloading the appropriate app for your phone (iOS or android), you need to set up a new pushover application on the Pushover website.

Click on Register an Application/Create an API Token. This will open the Create New Application/Plugin page.

  • Give the application a name (I called it Monit), but you call it anything you like.
  • Choose “script” as the type.
  • Add a description (I called it Monit Server Monitoring).
  • Leave the url field blank.
  • If you want you can add an icon, but you don’t need to do this. It is nice though having an icon when you get a message.
  • Press the Create Application button.

You need to record the new application API Token/Key as well as your Pushover User Key (you can find this on the main pushover page if you are logged in). You will need both these keys to have monit be able to ping Pushover via the alert script.

Install Monit

Install the EPEL package repository.

# yum install -y epel-release

Install monit and curl.

# yum install -y monit curl

Set monit to start on boot and start monit.

# chkconfig monit on && service monit start

You can edit the monif.conf file in /etc but the default values are fine. Take a look at the monit man page for more details about what you might want to change.

Create the Pushover Alert Script

You need to create the script that monit will call when it raises an alert.

# nano /usr/local/bin/pushover.sh

Paste the following text substituting your own API Token and User Keys before saving.

#!/bin/bash
 /usr/bin/curl -s --form-string "token=API Token" \
 --form-string "user=User Key" \
 --form-string "message=[$MONIT_HOST] $MONIT_SERVICE - $MONIT_DESCRIPTION" \
 https://api.pushover.net/1/messages.jsonop

Make the script executable.

# chmod 700 /usr/local/bin/pushover.sh

Test that the script works. If there are no issues the script will return without error and you will get an short message in the Pushover phone app almost immediately.

# /usr/local/bin/pushover.sh

Configure Monit

Once you have the pushover.sh alert script set up you need to create all the service-specific monit  .conf files. You can mix and match these to suit the services you are running on your server. The aim is to have monit restart the service if there are any issues and only if this does not solve the problem, call the pullover.sh alert script. This way most servers will fix themselves and you only get contacted if something catastrophic has happened.

system

# nano /etc/monit.d/system.conf

check system $HOST
if loadavg (5min) > 4 then exec "/usr/local/bin/pushover.sh"
if loadavg (15min) > 2 then exec "/usr/local/bin/pushover.sh"
if memory usage > 80% for 4 cycles then exec "/usr/local/bin/pushover.sh"
if swap usage > 20% for 4 cycles then exec "/usr/local/bin/pushover.sh"
if cpu usage (user) > 90% for 4 cycles then exec "/usr/local/bin/pushover.sh"
if cpu usage (system) > 80% for 4 cycles then exec "/usr/local/bin/pushover.sh"
if cpu usage (wait) > 80% for 4 cycles then exec "/usr/local/bin/pushover.sh"
if cpu usage > 200% for 4 cycles then exec "/usr/local/bin/pushover.sh"

apache

# nano /etc/monit.d/apache.conf

check process httpd with pidfile /var/run/httpd/httpd.pid
start program = "/etc/init.d/httpd start" with timeout 60 seconds
stop program = "/etc/init.d/httpd stop"
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then exec "/usr/local/bin/pushover.sh"
if failed port 80 for 2 cycles then restart
if 3 restarts within 5 cycles then exec "/usr/local/bin/pushover.sh"

sshd

# nano /etc/monit.d/sshd.conf

check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then exec "/usr/local/bin/pushover.sh"

fail2ban

# nano /etc/monit.d/fail2ban.conf

check process fail2ban with pidfile /var/run/fail2ban/fail2ban.pid
start program "/etc/init.d/fail2ban start"
stop program "/etc/init.d/fail2ban stop"
if 5 restarts within 5 cycles then exec "/usr/local/bin/pushover.sh"

syslog

# nano /etc/monit.d/syslog.conf

check process rsyslog with pidfile /var/run/syslogd.pid
start program "/etc/init.d/rsyslog start"
stop program "/etc/init.d/rsyslog stop"
if 5 restarts within 5 cycles then exec "/usr/local/bin/pushover.sh"

crond

# nano /etc/monit.d/crond.conf

check process crond with pidfile /var/run/crond.pid
start program "/etc/init.d/crond start"
stop program "/etc/init.d/crond stop"
if 5 restarts within 5 cycles then exec "/usr/local/bin/pushover.sh"

mysql

# nano /etc/monit.d/mysql.conf

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysqld start"
stop program = "/etc/init.d/mysqld stop"
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then exec "/usr/local/bin/pushover.sh"

Check that all the .conf file are correct

# monit -t

If everything is fine then start monitoring by loading the new .conf files.

# monit reload

Check the status of monit by using

# monit status

This should give you something like this depending on which services you are monitoring.

The Monit daemon 5.14 uptime: 3d 20h 17m

System 'rps.peaktraces.com'
 status Running
 monitoring status Monitored
 load average [0.00] [0.12] [0.11]
 cpu 0.2%us 0.1%sy 0.0%wa
 memory usage 106.6 MB [10.7%]
 swap usage 0 B [0.0%]
 data collected Tue, 19 Jul 2016 04:16:06

Process 'rsyslog'
 status Running
 monitoring status Monitored
 pid 1016
 parent pid 1
 uid 0
 effective uid 0
 gid 0
 uptime 4d 23h 33m
 children 0
 memory 3.4 MB
 memory total 3.4 MB
 memory percent 0.3%
 memory percent total 0.3%
 cpu percent 0.0%
 cpu percent total 0.0%
 data collected Tue, 19 Jul 2016 04:16:06

Process 'sshd'
 status Running
 monitoring status Monitored
 pid 1176
 parent pid 1
 uid 0
 effective uid 0
 gid 0
 uptime 4d 23h 33m
 children 4
 memory 1.2 MB
 memory total 20.7 MB
 memory percent 0.1%
 memory percent total 2.0%
 cpu percent 0.0%
 cpu percent total 0.0%
 port response time 0.006s to [localhost]:22 type TCP/IP protocol SSH
 data collected Tue, 19 Jul 2016 04:16:06

Process 'fail2ban'
 status Running
 monitoring status Monitored
 pid 1304
 parent pid 1
 uid 0
 effective uid 0
 gid 0
 uptime 4d 23h 33m
 children 0
 memory 30.2 MB
 memory total 30.2 MB
 memory percent 3.0%
 memory percent total 3.0%
 cpu percent 0.1%
 cpu percent total 0.1%
 data collected Tue, 19 Jul 2016 04:16:06

Process 'crond'
 status Running
 monitoring status Monitored
 pid 1291
 parent pid 1
 uid 0
 effective uid 0
 gid 0
 uptime 4d 23h 33m
 children 0
 memory 1.2 MB
 memory total 1.2 MB
 memory percent 0.1%
 memory percent total 0.1%
 cpu percent 0.0%
 cpu percent total 0.0%
 data collected Tue, 19 Jul 2016 04:16:06

Process 'httpd'
 status Running
 monitoring status Monitored
 pid 20963
 parent pid 1
 uid 0
 effective uid 0
 gid 0
 uptime 4h 5m
 children 2
 memory 7.7 MB
 memory total 19.0 MB
 memory percent 0.7%
 memory percent total 1.9%
 cpu percent 0.0%
 cpu percent total 0.0%
 data collected Tue, 19 Jul 2016 04:16:06

Suggestions

You may want to adjust the system.conf values if your server is under sustained high loads so as to scale back on the pushover triggers. Since you will know exactly what is the trigger this is quite easy to do.

To create a monit .conf file for a new services you just need to make sure that you use the correct .pid file path for the service and that the start and stop paths are correct. These can be a little non-obvious (look at syslog.conf for example). If you do make a mistake monit -t and monit status will show you what is wrong.

Once you have all this in place then sit back, relax and let the servers take care of themselves (well we can all dream).

Edit July 2017. I have been using this system for over a year now and it has been working great. I have had no problem that monit has not fixed by itself by just restarting the service. About the only issue I have had is load spikes on the server caused by a runaway service not monitored.

I have recently used the same approach to monitor for unauthorised logins which I wrote up Dead simple ssh login monitoring with Monit and Pushover.

3 comments on “Dead simple CentOS server monitoring with Monit and Pushover
  1. Really great article! I will implement this asap!
    I like that it doesn’t use server mail service.

    I lost 2 days trying to push notifications to my phone via mailx and pushover.
    I am wondering, pushing notification from fail2ban when it gives a ban in firewalld is possible with your solution abobe? I would like to use pushover everywhere instead of using mailx/mta smtp. Thank you!

  2. Sorin I am sure you could set up fail2ban to push a notification, but if your firewall is anything like mine you will be contacted every 5 seconds as some script kiddy tries to break in. I am not sure this would be such a good idea.

Leave a Reply

Your email address will not be published. Required fields are marked *