creating a Nagios check on a PostgreSQL table

When I started this, I thought it was be straight forward and take only a few hours at most. It turned into multiple hours over two days.

This is part two of a two-part post. The first part dealt with changing a database column from time to timestamp.

In this post:

PostgreSQL 12.2
Nagios 3.5
FreeBSD 12.1

The queries

The two queries I want to run are:

SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
SELECT count(*) FROM cache_clearing_ports WHERE date_added > now() - interval '1 hour';

The script

Combining those two into a script, I came up with this, after a bit of trial and error:

#!/bin/sh

# This script checks the cache_clearing_ports and cache_clearing_dates in
# a freshports database

# arguments
# * server   - IP address or hostname of the PostgreSQL database server
# * database - database name
# * user     - database user
#
# port is always the default value: 5432
#
# The password must be in ~.pgpass for the Unix user which runs this script.
#
# For Nagios, that defaults to ~nagios/.pgpass
#
# See https://www.postgresql.org/docs/12/libpq-pgpass.html
#

#
# arguments used to create simple output of a single value
# and to never issue a password prompt
#
psql="/usr/local/bin/psql --no-align --tuples-only --no-password "

server=$1
database=$2
user=$3

# we return zero, all good, by default
result=0

# the two database queries we will run
query_dates="SELECT count(*) FROM cache_clearing_dates WHERE date_added < now() - interval '1 hour';"
query_ports="SELECT count(*) FROM cache_clearing_ports WHERE date_added < now() - interval '1 hour';"

# how is the dates table?

dates=`$psql -c "$query_ports" --host=$server $database $user`
if [ $? == 0 ] ; then
# uncomment for debugging
#  echo all ok with dates query
else
  # when we hit an error, we do not care about the rest of the checks
  echo error on dates query:$? $dates
  exit 2
fi

# how is the ports table?

ports=`$psql -c "$query_dates" --host=$server $database $user`
if [ $? == 0 ] ; then
# uncomment for debugging
#  echo all ok with ports query
else
  # when we hit an error, we don't care about the rest of the checks
  echo error on ports query: $? $ports
  exit 2
fi

# check the results

if [ "$ports" != "0" ] ; then
  echo There are $ports ports entries in the cache
  result=2
fi

if [ "$dates" != "0" ] ; then
  echo There are $dates dates entries in the cache
  result=2
fi

if [ $result == 0 ] ; then
  echo All OK
fi

exit $result

How is this script invoked?

I run this script via net-mgmt/nrpe3. In general, I run that tool on every host and jail. The line within the configuration file looks like this:

$ grep check_freshports_cache_clearing /usr/local/etc/nrpe.cfg
command[check_freshports_cache_clearing]=/usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios

I store my personal Nagios scripts in a non-default location: /usr/local/libexec/nagios-custom

Usually, Nagios scripts are in /usr/local/libexec/nagios

Why? Mostly so I can keep them all in one place, and there is never any chance of file name collision.

Script parameters

Please note the following about these parameters:

very simple
rather inflexible
they all must be provided
they must appear in this order

The parameters are:

pg03.int.example.org : server – IP address or hostname of the PostgreSQL database server
freshports.dev : database – database name
nagios : database user

Do not confuse the nagios database user with the nagios unix user. They are two separate things.

The nagios user above is part of the PostgreSQL database system. I will show you how I created that later.

The nagios unix user is created by the Nagios port. Installing Nagios or nrpe is outside the scope of this post.

Password

I am a big fan of never putting sensitive information on the command line, such as passwords. Even if nobody else is ever on your system, don’t do it.

The solution here is the .pgpass as documented by the PostgreSQL project.

The script is run by nrpe which runs as the nagios user.

$ ps auwwx | grep nrpe
nagios     85203  0.0  0.0 15812  6368  -  IsJ  01:54   0:01.48 /usr/local/sbin/nrpe3 -c /usr/local/etc/nrpe.cfg -d
dan        15654  0.0  0.0 11324  2648  3  S+J  18:10   0:00.00 grep nrpe

It is because the script runs as the nagios user that the .pgpass files resides at ~nagios/.pgpass. It must be chown nagios and chmod 0600. My file looks like this, with contents slightly modified for public display:

$ sudo cat ~nagios/.pgpass
# hostname:port:database:username:password
pg03.int.example.org:5432:freshports.dev:nagios:notmypassword

The file looks like this:

$ ls -l  ~nagios/.pgpass
-rw-------  1 nagios  nagios  111 Mar 28 19:30 /var/spool/nagios/.pgpass

Why /var/spool/nagios?

Because of this:

$ grep nagios /etc/passwd 
nagios:*:181:181:Nagios pseudo-user:/var/spool/nagios:/usr/sbin/nologin

That is why. That is the home directory for the nagios user as defined in the UIDs file of the FreeBSD ports tree.

Creating the database user & permissions

These are the commands I used to create the nagios database user and the permissions required to query the tables.

First, I create a new group, or role. The group has SELECT rights on the table.

CREATE ROLE reporting;
GRANT SELECT ON cache_clearing_dates TO GROUP reporting;
GRANT SELECT ON cache_clearing_ports TO GROUP reporting;

Here, I create the nagios database user and add them to the role I just created.

CREATE ROLE nagios WITH LOGIN PASSWORD 'notmypassword';
GRANT reporting TO nagios;

Of note:

WITH LOGIN – allows this user to login. A role having the LOGIN attribute can be thought of as a user.
PASSWORD ‘notmypassword’ – specifies the password for this user. It also appears in the .pgpassfile for the user which will be running this script.

Testing the script

When testing the script from the command line, I did it this way:

[dan@dev-ingress01:~] $ /usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios
All OK

I ran this script as my regular user and I also created a ~/.pgpass file for myself, which looks just like the one for the nagios unix user.

Things to check

If you can’t get this to work, the things to check are:

pg_hba.conf file for the PostgreSQL database server
permissions on the database tables
permissions on the .pgpass files in question
make sure it runs fine from the command line before trying nrpe

Monitoring is key

Monitoring is key to early detection of problems. When something should always be in a given state, it should be monitored to ensure it is in that state. Having a monitoring system which will provide early detection of anomalies helps me fix problems early, rather than later.

I try to monitor everything. Connections to expected hosts. Services running. Plenty of disk space.

If you’re not monitoring yet, start slowly and build it up. That time will be well spent.

Hope this helps.