When I started this, I thought it was be straight forward and take only a few hours at most. It turned into multiple hours over two days.
This is part two of a two-part post. The first part dealt with changing a database column from time to timestamp.
In this post:
- PostgreSQL 12.2
- Nagios 3.5
- FreeBSD 12.1
The queries
The two queries I want to run are:
SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour'; SELECT count(*) FROM cache_clearing_ports WHERE date_added > now() - interval '1 hour';
The script
Combining those two into a script, I came up with this, after a bit of trial and error:
#!/bin/sh # This script checks the cache_clearing_ports and cache_clearing_dates in # a freshports database # arguments # * server - IP address or hostname of the PostgreSQL database server # * database - database name # * user - database user # # port is always the default value: 5432 # # The password must be in ~.pgpass for the Unix user which runs this script. # # For Nagios, that defaults to ~nagios/.pgpass # # See https://www.postgresql.org/docs/12/libpq-pgpass.html # # # arguments used to create simple output of a single value # and to never issue a password prompt # psql="/usr/local/bin/psql --no-align --tuples-only --no-password " server=$1 database=$2 user=$3 # we return zero, all good, by default result=0 # the two database queries we will run query_dates="SELECT count(*) FROM cache_clearing_dates WHERE date_added < now() - interval '1 hour';" query_ports="SELECT count(*) FROM cache_clearing_ports WHERE date_added < now() - interval '1 hour';" # how is the dates table? dates=`$psql -c "$query_ports" --host=$server $database $user` if [ $? == 0 ] ; then # uncomment for debugging # echo all ok with dates query else # when we hit an error, we do not care about the rest of the checks echo error on dates query:$? $dates exit 2 fi # how is the ports table? ports=`$psql -c "$query_dates" --host=$server $database $user` if [ $? == 0 ] ; then # uncomment for debugging # echo all ok with ports query else # when we hit an error, we don't care about the rest of the checks echo error on ports query: $? $ports exit 2 fi # check the results if [ "$ports" != "0" ] ; then echo There are $ports ports entries in the cache result=2 fi if [ "$dates" != "0" ] ; then echo There are $dates dates entries in the cache result=2 fi if [ $result == 0 ] ; then echo All OK fi exit $result
How is this script invoked?
I run this script via net-mgmt/nrpe3. In general, I run that tool on every host and jail. The line within the configuration file looks like this:
$ grep check_freshports_cache_clearing /usr/local/etc/nrpe.cfg command[check_freshports_cache_clearing]=/usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios
I store my personal Nagios scripts in a non-default location: /usr/local/libexec/nagios-custom
Usually, Nagios scripts are in /usr/local/libexec/nagios
Why? Mostly so I can keep them all in one place, and there is never any chance of file name collision.
Script parameters
Please note the following about these parameters:
- very simple
- rather inflexible
- they all must be provided
- they must appear in this order
The parameters are:
- pg03.int.example.org : server – IP address or hostname of the PostgreSQL database server
- freshports.dev : database – database name
- nagios : database user
Do not confuse the nagios database user with the nagios unix user. They are two separate things.
The nagios user above is part of the PostgreSQL database system. I will show you how I created that later.
The nagios unix user is created by the Nagios port. Installing Nagios or nrpe is outside the scope of this post.
Password
I am a big fan of never putting sensitive information on the command line, such as passwords. Even if nobody else is ever on your system, don’t do it.
The solution here is the .pgpass as documented by the PostgreSQL project.
The script is run by nrpe which runs as the nagios user.
$ ps auwwx | grep nrpe nagios 85203 0.0 0.0 15812 6368 - IsJ 01:54 0:01.48 /usr/local/sbin/nrpe3 -c /usr/local/etc/nrpe.cfg -d dan 15654 0.0 0.0 11324 2648 3 S+J 18:10 0:00.00 grep nrpe
It is because the script runs as the nagios user that the .pgpass files resides at ~nagios/.pgpass. It must be chown nagios and chmod 0600. My file looks like this, with contents slightly modified for public display:
$ sudo cat ~nagios/.pgpass # hostname:port:database:username:password pg03.int.example.org:5432:freshports.dev:nagios:notmypassword
The file looks like this:
$ ls -l ~nagios/.pgpass -rw------- 1 nagios nagios 111 Mar 28 19:30 /var/spool/nagios/.pgpass
Why /var/spool/nagios?
Because of this:
$ grep nagios /etc/passwd nagios:*:181:181:Nagios pseudo-user:/var/spool/nagios:/usr/sbin/nologin
That is why. That is the home directory for the nagios user as defined in the UIDs file of the FreeBSD ports tree.
Creating the database user & permissions
These are the commands I used to create the nagios database user and the permissions required to query the tables.
First, I create a new group, or role. The group has SELECT rights on the table.
CREATE ROLE reporting; GRANT SELECT ON cache_clearing_dates TO GROUP reporting; GRANT SELECT ON cache_clearing_ports TO GROUP reporting;
Here, I create the nagios database user and add them to the role I just created.
CREATE ROLE nagios WITH LOGIN PASSWORD 'notmypassword'; GRANT reporting TO nagios;
Of note:
- WITH LOGIN – allows this user to login. A role having the LOGIN attribute can be thought of as a user.
- PASSWORD ‘notmypassword’ – specifies the password for this user. It also appears in the .pgpassfile for the user which will be running this script.
Testing the script
When testing the script from the command line, I did it this way:
[dan@dev-ingress01:~] $ /usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios All OK
I ran this script as my regular user and I also created a ~/.pgpass file for myself, which looks just like the one for the nagios unix user.
Things to check
If you can’t get this to work, the things to check are:
- pg_hba.conf file for the PostgreSQL database server
- permissions on the database tables
- permissions on the .pgpass files in question
- make sure it runs fine from the command line before trying nrpe
Monitoring is key
Monitoring is key to early detection of problems. When something should always be in a given state, it should be monitored to ensure it is in that state. Having a monitoring system which will provide early detection of anomalies helps me fix problems early, rather than later.
I try to monitor everything. Connections to expected hosts. Services running. Plenty of disk space.
If you’re not monitoring yet, start slowly and build it up. That time will be well spent.
Hope this helps.