Mar 292020
 

When I started this, I thought it was be straight forward and take only a few hours at most. It turned into multiple hours over two days.

This is part two of a two-part post. The first part dealt with changing a database column from time to timestamp.

In this post:

  1. PostgreSQL 12.2
  2. Nagios 3.5
  3. FreeBSD 12.1

The queries

The two queries I want to run are:

SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
SELECT count(*) FROM cache_clearing_ports WHERE date_added > now() - interval '1 hour';

The script

Combining those two into a script, I came up with this, after a bit of trial and error:

#!/bin/sh

# This script checks the cache_clearing_ports and cache_clearing_dates in
# a freshports database

# arguments
# * server   - IP address or hostname of the PostgreSQL database server
# * database - database name
# * user     - database user
#
# port is always the default value: 5432
#
# The password must be in ~.pgpass for the Unix user which runs this script.
#
# For Nagios, that defaults to ~nagios/.pgpass
#
# See https://www.postgresql.org/docs/12/libpq-pgpass.html
#

#
# arguments used to create simple output of a single value
# and to never issue a password prompt
#
psql="/usr/local/bin/psql --no-align --tuples-only --no-password "

server=$1
database=$2
user=$3

# we return zero, all good, by default
result=0

# the two database queries we will run
query_dates="SELECT count(*) FROM cache_clearing_dates WHERE date_added < now() - interval '1 hour';"
query_ports="SELECT count(*) FROM cache_clearing_ports WHERE date_added < now() - interval '1 hour';"

# how is the dates table?

dates=`$psql -c "$query_ports" --host=$server $database $user`
if [ $? == 0 ] ; then
# uncomment for debugging
#  echo all ok with dates query
else
  # when we hit an error, we do not care about the rest of the checks
  echo error on dates query:$? $dates
  exit 2
fi

# how is the ports table?

ports=`$psql -c "$query_dates" --host=$server $database $user`
if [ $? == 0 ] ; then
# uncomment for debugging
#  echo all ok with ports query
else
  # when we hit an error, we don't care about the rest of the checks
  echo error on ports query: $? $ports
  exit 2
fi

# check the results

if [ "$ports" != "0" ] ; then
  echo There are $ports ports entries in the cache
  result=2
fi

if [ "$dates" != "0" ] ; then
  echo There are $dates dates entries in the cache
  result=2
fi

if [ $result == 0 ] ; then
  echo All OK
fi

exit $result

How is this script invoked?

I run this script via net-mgmt/nrpe3. In general, I run that tool on every host and jail. The line within the configuration file looks like this:

$ grep check_freshports_cache_clearing /usr/local/etc/nrpe.cfg
command[check_freshports_cache_clearing]=/usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios

I store my personal Nagios scripts in a non-default location: /usr/local/libexec/nagios-custom

Usually, Nagios scripts are in /usr/local/libexec/nagios

Why? Mostly so I can keep them all in one place, and there is never any chance of file name collision.

Script parameters

Please note the following about these parameters:

  • very simple
  • rather inflexible
  • they all must be provided
  • they must appear in this order

The parameters are:

  1. pg03.int.example.org : server – IP address or hostname of the PostgreSQL database server
  2. freshports.dev : database – database name
  3. nagios : database user

Do not confuse the nagios database user with the nagios unix user. They are two separate things.

The nagios user above is part of the PostgreSQL database system. I will show you how I created that later.

The nagios unix user is created by the Nagios port. Installing Nagios or nrpe is outside the scope of this post.

Password

I am a big fan of never putting sensitive information on the command line, such as passwords. Even if nobody else is ever on your system, don’t do it.

The solution here is the .pgpass as documented by the PostgreSQL project.

The script is run by nrpe which runs as the nagios user.

$ ps auwwx | grep nrpe
nagios     85203  0.0  0.0 15812  6368  -  IsJ  01:54   0:01.48 /usr/local/sbin/nrpe3 -c /usr/local/etc/nrpe.cfg -d
dan        15654  0.0  0.0 11324  2648  3  S+J  18:10   0:00.00 grep nrpe

It is because the script runs as the nagios user that the .pgpass files resides at ~nagios/.pgpass. It must be chown nagios and chmod 0600. My file looks like this, with contents slightly modified for public display:

$ sudo cat ~nagios/.pgpass
# hostname:port:database:username:password
pg03.int.example.org:5432:freshports.dev:nagios:notmypassword

The file looks like this:

$ ls -l  ~nagios/.pgpass
-rw-------  1 nagios  nagios  111 Mar 28 19:30 /var/spool/nagios/.pgpass

Why /var/spool/nagios?

Because of this:

$ grep nagios /etc/passwd 
nagios:*:181:181:Nagios pseudo-user:/var/spool/nagios:/usr/sbin/nologin

That is why. That is the home directory for the nagios user as defined in the UIDs file of the FreeBSD ports tree.

Creating the database user & permissions

These are the commands I used to create the nagios database user and the permissions required to query the tables.

First, I create a new group, or role. The group has SELECT rights on the table.

CREATE ROLE reporting;
GRANT SELECT ON cache_clearing_dates TO GROUP reporting;
GRANT SELECT ON cache_clearing_ports TO GROUP reporting;

Here, I create the nagios database user and add them to the role I just created.

CREATE ROLE nagios WITH LOGIN PASSWORD 'notmypassword';
GRANT reporting TO nagios;

Of note:

  • WITH LOGIN – allows this user to login. A role having the LOGIN attribute can be thought of as a user.
  • PASSWORD ‘notmypassword’ – specifies the password for this user. It also appears in the .pgpassfile for the user which will be running this script.

Testing the script

When testing the script from the command line, I did it this way:

[dan@dev-ingress01:~] $ /usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios
All OK

I ran this script as my regular user and I also created a ~/.pgpass file for myself, which looks just like the one for the nagios unix user.

Things to check

If you can’t get this to work, the things to check are:

  • pg_hba.conf file for the PostgreSQL database server
  • permissions on the database tables
  • permissions on the .pgpass files in question
  • make sure it runs fine from the command line before trying nrpe

Monitoring is key

Monitoring is key to early detection of problems. When something should always be in a given state, it should be monitored to ensure it is in that state. Having a monitoring system which will provide early detection of anomalies helps me fix problems early, rather than later.

I try to monitor everything. Connections to expected hosts. Services running. Plenty of disk space.

If you’re not monitoring yet, start slowly and build it up. That time will be well spent.

Hope this helps.

Mar 292020
 

The plan for today was not decided until after I got out of bed. I sat down with coffee and looked through the list of tasks. I found one which appealed: monitor the cache_clearing_ports & cache_clearing_dates tables

This is part one of a two-part story. The first part deals with fixing a database table before doing the planned work in part two, a Nagios check.

In this post:

  1. PostgreSQL 12.2
  2. Nagios 3.5
  3. FreeBSD 12.1

FreshPorts stores a lot of data in the database, but caches pages on disk. Specific events clear that cache, and the two tables listed will contain lists of items to remove from the cache.

Disk? Cache?

FreshPorts was built long before modern cache tools existed. We built our own. They work great for what is needed. The server uses ZFS and anything frequently used from disk is also cached in memory (see ZFS ARC). A page takes longer to load the first time it is built, but afterwards, it is fast enough.

The checking query

Sometimes the cache clearing lags. Sometimes there is a bug in the code. For whatever reasons behind the lag, I want to be notified if there are entries in this table which are more than 60 minutes old.

Here is the query I started with:

freshports.dev=# SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
ERROR:  operator does not exist: time without time zone > timestamp with time zone
LINE 1: ...unt(*) FROM cache_clearing_dates WHERE date_added > now() - ...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

What? Why doesn’t what work? Look at the caret above? The comparison operator is the issue. I can’t compae those two values.

Which two values?

A time without time zone and a timestamp with time zone.

Where do I have a time field? Let’s check the table.

freshports.dev=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.dev=# 

Oh. The date_added column should be a timestamp without time zone. This table was created incorrectly. I checked test, stage, and prod. All the same.

The cache_clearing_ports table is correct:

freshports.org=# \d public.cache_clearing_ports
                                        Table "public.cache_clearing_ports"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

Fixing cache_clearing_dates

This is the table structure I started with:

freshports.dev=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.dev=#

Ummm, shouldn’t date_added be timestamp without time zone?

I’m sure that field does not have a date value in it. It should.

The tests

I wanted verification this was wrong. I was sure, but I wanted to be positively sure.

The better way to do this, and I thought of this after I started this blog post, is:

freshports.dev=# SELECT now()::time without time zone;
       now       
-----------------
 15:07:35.428659
(1 row)

freshports.dev=# 

To compose this test, take the Default value from line 7, and cast it use the Type on the same line.

What I really want is this:

freshports.dev=# SELECT now()::timestamp without time zone;
            now             
----------------------------
 2020-03-28 15:13:55.946128
(1 row)

freshports.dev=# 

Yes, that’s better. I want to know how long this entry has been in the table. A date is required for that.

Don’t try this at home

What did I try first? I don’t recommend this, but it worked for me.

freshports.dev=# INSERT INTO public.cache_clearing_dates (date_to_clear) VALUES ('2020-01-01');
INSERT 0 1
freshports.dev=# select * from cache_clearing_dates;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 162763 | 2020-01-01    | 14:51:33.617565
(1 row)

freshports.dev=# rollback;

Yeah, that column has to change.

The change

PLEASE NOTE that in my specific case, this table has no data. There is no existing data to convert. I will show you one way to do that, but if you have data to convert, you may need to find answers elsewhere.

Here is a first attempt. Notice the transaction. If you have a lot of data in there, please be aware of how this might be blocking your applications. Mind you, if you have such a table, and applications, I would think you would already be aware of those issues.

First failure

This should be easy. ALTER TABLE…

freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone;
ERROR:  column "date_added" cannot be cast automatically to type timestamp without time zone
HINT:  You might need to specify "USING date_added::timestamp without time zone".
freshports.dev=# rollback;
ROLLBACK

That’s new to me. Let’s read up on that a bit.. Oh yes, this is HOW PostgreSQL will convert your existing data.

Overwriting the column

THE FOLLOWING WILL LOSE DATA. What I am doing in the following command will set all rows to now() and existing values will be lost.

freshports.dev=# begin;
BEGIN
freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone USING now();
ALTER TABLE
freshports.dev=# \d cache_clearing_dates
                                          Table "public.cache_clearing_dates"
    Column     |            Type             | Collation | Nullable |                     Default                      
---------------+-----------------------------+-----------+----------+--------------------------------------------------
 id            | integer                     |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                        |           |          | 
 date_added    | timestamp without time zone |           |          | now()

freshports.dev=# rollback;
ROLLBACK
freshports.dev=# 

Prepending a fixed date value

This attempt will prepend the current date to all values in the table.

freshports.dev=# begin;
BEGIN
freshports.dev=# -- put some test data into my table
freshports.dev=# INSERT INTO public.cache_clearing_dates (date_to_clear) VALUES ('2020-01-01');
INSERT 0 1
freshports.dev=# -- test out the prepend code
freshports.dev=# select now()::date + date_added from public.cache_clearing_dates;
          ?column?          
----------------------------
 2020-03-28 15:27:22.119605
(1 row)

freshports.dev=# -- alter the table
freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone USING now()::date + date_added;
ALTER TABLE
freshports.dev=# SELECT * FROM public.cache_clearing_dates;
   id   | date_to_clear |         date_added         
--------+---------------+----------------------------
 162765 | 2020-01-01    | 2020-03-28 15:27:22.119605
(1 row)

freshports.dev=# -- undo my changes
freshports.dev=# rollback;
ROLLBACK

The change I used

This is the command I used to update my table:

freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone USING now()::date + date_added;
ALTER TABLE

This is the test I used to verify I was getting what I wanted:

freshports.dev=# begin; insert into public.cache_clearing_dates (date_to_clear) values ('2020-01-01');
BEGIN
INSERT 0 1
freshports.dev=# SELECT * FROM public.cache_clearing_dates;
   id   | date_to_clear |         date_added         
--------+---------------+----------------------------
 162766 | 2020-01-01    | 2020-03-28 15:30:22.439249
(1 row)

freshports.dev=# rollback;
ROLLBACK
freshports.dev=# 

The queries for the Nagios check

The queries are straight forward. There should be zero entries more than 60 minutes old:

freshports.dev=# SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
 count 
-------
     0
(1 row)

freshports.dev=# SELECT count(*) FROM cache_clearing_ports WHERE date_added > now() - interval '1 hour';
 count 
-------
     0
(1 row)

freshports.dev=# 

I combined those two into a script, which appears in part two of this post.

Jan 262020
 

FreshPorts had a cache clearing issue recently. It lingered. For a long time. It took me a while to figure it out. It turned out to be a Python 3 compatibility issue.

In this post:

  • PostgreSQL 12.1
  • FreeBSD 12.1

I will outline how database changes invoke external scripts asynchronously which then clear the cache. I will go into the debugging of the issue and how the bug was fixed.

How caching works

FreshPorts caches data on disk. When you go to visit a page, the website checks the cache on disk, and if not found, it runs the queries, creates the HTML then saves it to disk.

Really? You cache to disk? Why?

Q: Why don’t you using memcached or some other caching tool?

A: FreshPorts was created long before such fancy tools.

The database side of caching

FreshPorts uses PostgreSQL, an amazing database which just happens to be open source and freely available. The initial work was done on a lesser database system, but then I was introduced to PostgreSQL in July 2000, which had grown-up features I expected, such as stored procedures and relational integrity. I never looked back.

Cache clearing in FreshPorts relies upon asynchronous notification using the LISTEN and NOTIFY commands.

A small python script (about 220 lines) issues several LISTEN commands, then starts polling.

This is the connection:

import configparser

config = configparser.ConfigParser()
config.read('/usr/local/etc/freshports/fp-listen.ini')

DSN = 'host=' + config['database']['HOST'] + ' dbname=' + config['database']['DBNAME'] + ' user=' + config['database']['DBUSER'] + ' password=' + config['database']['PASSWORD']

conn = psycopg2.connect(DSN)
curs = conn.cursor()

Next, we read from the database a list of LISTEN channels and then issue a LISTEN command for that channel

curs.execute("SELECT name, script_name FROM listen_for ORDER BY id")
listens_for = curs.fetchall()

listens = dict()
print ("These are the (event name, script name) pairs we are ready for:")
for listen in listens_for:
  curs.execute("LISTEN %s" % listen[0])
  listens[listen[0]] = listen[1]
  print ("('%s', '%s')" % (listen[0], listen[1]))

In this case, name is a channel and script_name is what the Python script will run when it encounters a NOTIFY for that channel.

This is the main loop of the code:

while 1:
  if select.select([conn],[],[],5)==([],[],[]):
    syslog.syslog(syslog.LOG_NOTICE, 'timeout! *************')
  else:
    conn.poll()
    syslog.syslog(syslog.LOG_NOTICE, 'Just woke up! *************')
    while conn.notifies:
      notify = conn.notifies.pop(0);
      # in real life, do something with each...
      syslog.syslog(syslog.LOG_NOTICE, "Got NOTIFY: %d, %s, %s" % (notify.pid, notify.channel, notify.payload));
      if notify.channel in listens:
        syslog.syslog(syslog.LOG_NOTICE, "found key %s" % (notify.channel));
        clear_cache = True;
        if listens[notify.channel]   == 'listen_port':
          syslog.syslog(syslog.LOG_NOTICE, "invoking RemoveCacheEntry()");
          RemoveCacheEntry()
        elif listens[notify.channel] == 'listen_date_updated':
          syslog.syslog(syslog.LOG_NOTICE, "invoking ClearDateCacheEntries()");
          ClearDateCacheEntries()
        ...
        ...
        else:
          clear_cache = False;
          syslog.syslog(syslog.LOG_ERR, "Code does not know what to do when '%s' is found." % notify.channel)

        if clear_cache:
          ClearMiscCaches()

      else:
        syslog.syslog(syslog.LOG_NOTICE, 'no such key in listens array for %s!' % (notify.channel))

This isn’t great code, patches & suggestions welcomed.

I remember this worked differently in earlier versions of PostgreSQL, but this approach has been in use for several years now. The poll command times out occasionally, and when it does, we just loop again.

When we receive a NOTIFY, we verify it is in the list of what we LISTEN for, then go through the list and invoke the appropriate script.

The database side

Triggers will invoke a NOTIFY when and where relevant. An update on the commit_log table will add an entry to the cache_clearing_dates table and invoke NOTIFY date_updated.

How cache clearing works

With the database issuing the NOTIFY, the script receives the NOTIFY date_updated. It will then query the cache_clearing_dates table to see what dates need to be cleared. On disk this looks like this:

[dan@dev-nginx01:/var/service/fp-listen/fp-listen] $ sudo ls -l /var/db/freshports/cache/daily/2020/01
total 340
-rw-rw-r--  1 www  freshports  143080 Jan  9 23:26 08.daily.head
-rw-rw-r--  1 www  freshports  107017 Jan 11 04:57 10.daily.head
-rw-rw-r--  1 www  freshports  152478 Jan 21 14:26 17.daily.head
-rw-rw-r--  1 www  freshports  129303 Jan 25 18:12 23.daily.head

Thus, the code just deletes the given file pertaining to the date in question.

Finding the issue

Debugging the caching was challenging in that the first step was to reproduce the issue.

I started the small Python app on my dev FreshPorts jail:

[root@dev-nginx01:/var/service/fp-listen] # echo ./fp-listen/fp-listen.py | su -fm freshports
These are the (event name, script name) pairs we are ready for:
('port_updated', 'listen_port')
('ports_moved', 'listen_ports_moved')
('ports_updating', 'listen_ports_updating')
('vuxml', 'listen_vuxml')
('category_new', 'listen_category_new')
('date_updated', 'listen_date_updated')

The output you see is what the app has read from the listen_for table. The events are data-driven. See lines 14-18 above.

freshports.dev=# SELECT * FROM listen_for;
 id |      name      |      script_name      
----+----------------+-----------------------
  5 | port_updated   | listen_port
  6 | ports_moved    | listen_ports_moved
  7 | ports_updating | listen_ports_updating
  8 | vuxml          | listen_vuxml
 10 | category_new   | listen_category_new
 11 | date_updated   | listen_date_updated
(6 rows)

freshports.dev=# 

Now that the code is running, in a separate ssh session, I connect to the database and manually issue a NOTIFY command:

freshports.dev=# NOTIFY date_updated;
NOTIFY
freshports.dev=# 

BOOM, the Python application dies in a heap:

Traceback (most recent call last):
  File "./fp-listen/fp-listen.py", line 190, in 
    if listens.has_key(notify.channel):
AttributeError: 'dict' object has no attribute 'has_key'
[root@dev-nginx01:/var/service/fp-listen] #

NOTE:

  1. the code you see above has already been patched
  2. line 190 in the code is actually line 29 as shown above

The fix

The fix was:

-      if listens.has_key(notify.channel):
+      if notify.channel in listens:

This fix is seen online 8 above and is a known and documented Python 2 – > Python 3 migration issue

Look at all the backed up queues!

When fp-listen gets a NOTIFY for ports or dates which need clearing, that data can contain a payload, but this must be shorter than 8000 bytes. As mentioned above, we store this data in tables instead of sending it directly to the Python app.

Looking in the tables:

[dan@x8dtu-pg02:~] $ ': psql freshports.org
Password for user dan: 
psql (12.1)
Type "help" for help.

freshports.org=# select count(*) from cache_clearing_ports;
 count 
-------
 20508
(1 row)

freshports.org=# select count(*) from cache_clearing_dates;
 count 
-------
   290
(1 row)

freshports.org=# 

So how long has this been an issue:

freshports.org=# select * from cache_clearing_dates order by date_added asc limit 5;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 172300 | 2020-01-22    | 00:08:21.443837
 172301 | 2020-01-22    | 00:12:37.328254
 172302 | 2020-01-22    | 00:30:32.244191
 172303 | 2020-01-22    | 00:41:27.386203
 172304 | 2020-01-22    | 00:43:57.517707
(5 rows)

freshports.org=# select * from cache_clearing_dates order by date_added desc limit 5;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 172299 | 2020-01-21    | 23:58:53.74748
 172298 | 2020-01-21    | 23:49:24.205331
 172188 | 2020-01-20    | 23:20:03.807726
 172297 | 2020-01-21    | 23:16:04.518664
 172187 | 2020-01-20    | 23:06:55.273125
(5 rows)

freshports.org=# 

Oh, it seems the cache_clearing_dates.date_added be a timestamp. I’ll fix that issue later.

Let’s try the ports cache queue:

freshports.org=# select * from cache_clearing_ports order by date_added asc limit 5;
    id    | port_id | category |     port      |         date_added         
----------+---------+----------+---------------+----------------------------
 16669514 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
 16669515 |   16343 | devel    | py-setuptools | 2020-01-20 17:23:04.747768
 16669516 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
 16669517 |   49106 | lang     | python37      | 2020-01-20 17:23:04.747768
 16669513 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
(5 rows)

freshports.org=# select * from cache_clearing_ports order by date_added desc limit 5;
    id    | port_id | category |       port       |         date_added         
----------+---------+----------+------------------+----------------------------
 16691617 |   37057 | print    | R-cran-knitr     | 2020-01-22 15:37:13.407464
 16691618 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691615 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691616 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691619 |   37051 | textproc | R-cran-yaml      | 2020-01-22 15:37:13.407464
(5 rows)

freshports.org=# 

From this, the problem arose on January 20th.

FYI, the tables look like this:

freshports.org=# \d cache_clearing_ports 
                                        Table "public.cache_clearing_ports"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.org=# 

Releasing the code

After I fixed the code, packaged it, and installed it on test, I encountered this error:

Jan 22 17:28:32 test-nginx01 fp-listen[62850]: ClearMiscCaches() is clearing '/var/db/freshports/cache/new/*'

Eh? That’s the wrong directory. It should be news not new. Here’s the code in question

def ClearMiscCaches():
  syslog.syslog(syslog.LOG_NOTICE, 'invoked: ClearMiscCaches()');
  filenameglob = config['dirs']['NEWS_CACHE_DIR'];
  syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is clearing %s' % (filenameglob));
  for filename in glob.glob(filenameglob):
    syslog.syslog(syslog.LOG_NOTICE, 'removing %s' % (filename))
    try:
      if os.path.isfile(filename):
        os.remove(filename)
      else:
        shutil.rmtree(filename)

    except (OSError, err):
      if err[0] == 2:
        pass  # no file to delete, so no worries

      else:
        syslog.syslog(syslog.LOG_CRIT, 'ERROR: error deleting cache entry.  Error message is %s' % (err))
        continue

Here is the entry from fp-listen.ini (see line 4):

NEWS_CACHE_DIR  = '%(BASEDIR)s/cache/new/*'

That should be:

NEWS_CACHE_DIR  = '%(BASEDIR)s/cache/news/*'

I had to make this fix on each server, other than dev. It was incorrect everywhere.

I restarted the service, which is controlled by daemontools:

sudo svc -t /var/service/fp-listen

I found the code on line 5 was not working as expected. No syslog entries were being recorded. glob.glob(filenameglob) was not enumerating everything. Tests in a small script worked. That same test script when run by the freshports user also yielded expected results. I could not locate a reason why this would fail in fp-listen.

New ClearMiscCaches() code

After reviewing the documentation for glob, I stumbled across Path and used it. I changed the ClearMiscCaches() code as shown below:

def ClearMiscCaches():
  syslog.syslog(syslog.LOG_NOTICE, 'invoked: ClearMiscCaches()');
  
  news_cache_dir = config['dirs']['NEWS_CACHE_DIR'];
  syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is clearing out entries in %s' % (news_cache_dir));

  for filename in Path(news_cache_dir).iterdir():
    syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is removing %s' % (filename))
    try:
      if Path(filename).is_file():
        Path(filename).unlink()
      else:
        shutil.rmtree(filename)

The code on line 5 is the key. We are iterating over a path to get the files we need. dotfiles will be overlooked, which is sufficient for our needs. We do not use dotfiles (e.g. .Caching_Not_Done_Like_This).

It was very strange, because glob.glob(filenameglob) worked in other parts of the fp-listen code, but not in this directory.

New package

With that code change, the new package was created and made it way through test, stage, and beta servers before being moved into production.

The cause of the problem seems to be the migration from Python 2 to Python 3. I would like to more testing on the glob issue, but I have moved onto other issues.

Why LISTEN/NOTIFY?

LISTEN/NOTIFY is a valuable tool when your database needs to invoke external events. Let the transaction[s] complete and then the external code will eventually do your bidding via the LISTEN portion of the event.

Hopefully, this will open up a whole new avenue of solutions for you.

Jan 272019
 

Yesterday I copied data from the old production server to the new production server. One thing I missed, but did think about at the time, was updating the sequence used by the table in question. Looking at the table definition:

freshports.org=# \d report_log
                                          Table "public.report_log"
    Column    |           Type           | Collation | Nullable |                  Default                   
--------------+--------------------------+-----------+----------+--------------------------------------------
 id           | integer                  |           | not null | nextval('report_log_id_seq'::regclass)
 report_id    | integer                  |           | not null | 
 frequency_id | integer                  |           |          | 
 report_date  | timestamp with time zone |           | not null | ('now'::text)::timestamp(6) with time zone
 email_count  | integer                  |           | not null | 
 commit_count | integer                  |           | not null | 
 port_count   | integer                  |           | not null | 
Indexes:
    "report_log_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "$1" FOREIGN KEY (frequency_id) REFERENCES report_frequency(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (report_id) REFERENCES reports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

The report_log_id_seq value will be wrong. When the reports run, they will use values for id which are already present in the table. To confirm, I ran this test:

freshports.org=# BEGIN;
BEGIN
freshports.org=# INSERT INTO report_log (report_id, frequency_id, email_count, commit_count, port_count) VALUES (2, 4, 0, 0, 0);
ERROR:  duplicate key value violates unique constraint "report_log_pkey"
DETAIL:  Key (id)=(19074) already exists.
freshports.org=# ROLLBACK;
ROLLBACK
freshports.org=# SELECT max(id) FROM report_log;
  max  
-------
 20144
(1 row)

freshports.org=# 

Historically, I have done this with setval but today I will try ALTER SEQUENCE.

freshports.org=# BEGIN; ALTER SEQUENCE report_log_id_seq RESTART WITH 20145;
BEGIN
ALTER SEQUENCE
freshports.org=# INSERT INTO report_log (report_id, frequency_id, email_count, commit_count, port_count) VALUES (2, 4, 0, 0, 0);
INSERT 0 1
freshports.org=# ROLLBACK;
ROLLBACK

That worked, so I rolled it back and this time I’ll save the changes without inserting data;

freshports.org=# BEGIN; ALTER SEQUENCE report_log_id_seq RESTART WITH 20145;
BEGIN
ALTER SEQUENCE
freshports.org=# COMMIT;
COMMIT
freshports.org=# 

I remembered this issue while sorting out a configuration & code error this morning.

Jan 262019
 

Ever since the new server went into production, sometime in 2017, the notification emails, which tell you what has changed, have been coming from the old server. I never did that changeover. The old server was still processing incoming commits, so the notifications were accurate. However, new accounts would not get notifications and changes to existing accounts would be ignored. The notifications were getting increasingly inaccurate for your settings.

This is far from ideal and not good at all.

What I have to do now is copy over the notification statistics.

When did this change occur

I was not sure when the new server went into production. I looked in old blog posts, and on social media, and failed to find out.

I resorted to DNS because every change is recorded in subversion, thanks to nsnotifyd (more information on how I did that via this post).

Finding that date took me a while, because I was searching for my name in the logs, I failed. Every commit for past 17 months has been done by the dns-notify user. Eventually I found it. I can confidently say the change over occurred at Sun Nov 5 16:53:23 2017 UTC.

This date will be used later in this post.

What needs to be moved over

The following is on the old server.

I wasn’t sure exactly what needed to be moved, but I knew the answer might be in the report-notification.pl scripts. Looking in there, I found this snippet:

                my $dbh = FreshPorts::Database::GetDBHandle();
                if ($dbh->{Active}) {

                        $sql = "select last_sent 
                                  from report_log_latest
                                 where frequency = '$Frequency'
                                   and report_id = $ReportID";
                        if ($FreshPorts::Config::ReportDebugging) {
                                print "sql is $sql\n";
                        }

                        $sth = $dbh->prepare($sql);
                        $sth->execute ||
                                        die "Could not execute SQL $sql ... maybe invalid";

                        @row=$sth->fetchrow_array;
                        my $last_sent = $row[0];

That’s it, I need entries from the report_log_latest table. What does that table look like?

freshports.org=# \d report_log_latest
                     View "public.report_log_latest"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 report_id    | integer                  |           |          | 
 frequency_id | integer                  |           |          | 
 frequency    | character(1)             |           |          | 
 last_sent    | timestamp with time zone |           |          | 

freshports.org=# 
 

Oh, it’s a view, not a table. Let’s look closer:

freshports.org=# \d+ report_log_latest
                                  View "public.report_log_latest"
    Column    |           Type           | Collation | Nullable | Default | Storage  | Description 
--------------+--------------------------+-----------+----------+---------+----------+-------------
 report_id    | integer                  |           |          |         | plain    | 
 frequency_id | integer                  |           |          |         | plain    | 
 frequency    | character(1)             |           |          |         | extended | 
 last_sent    | timestamp with time zone |           |          |         | plain    | 
View definition:
 SELECT report_log.report_id,
    report_log.frequency_id,
    report_frequency.frequency,
    max(report_log.report_date) AS last_sent
   FROM report_log,
    report_frequency
  WHERE report_log.frequency_id = report_frequency.id
  GROUP BY report_log.report_id, report_log.frequency_id, report_frequency.frequency;

freshports.org=# 

How long has it been since any reports were sent out:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         2 |            4 | F         | 2018-11-09 19:20:00.456444+00
         1 |            4 | F         | 2018-11-09 19:20:00.434312+00
         2 |            2 | D         | 2018-11-09 18:10:00.431505+00
         1 |            2 | D         | 2018-11-09 18:10:00.411285+00
         5 |            2 | D         | 2018-11-09 18:05:00.407087+00
         1 |            3 | W         | 2018-11-06 19:00:01.348784+00
         2 |            3 | W         | 2018-11-06 19:00:00.497571+00
         1 |            5 | M         | 2018-10-23 19:40:00.677857+00
         2 |            5 | M         | 2018-10-23 19:40:00.426695+00
(9 rows)

freshports.org=# 

OK, that’s about 11 weeks ago.

Let’s look at production

This is from the production server. Yes, the database name is the same. I think the database should be freshports.prod on this server. It helps that each of the databases is on a different server. At home, where they are all on one server, I have:

  1. freshports.dev
  2. freshports.stage
  3. freshports.test

On production, it has been 64 weeks since we took a copy of the old database and moved it to the new server:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         1 |            2 | D         | 2017-11-04 18:10:00.440969+00
         2 |            2 | D         | 2017-11-04 18:10:00.425991+00
         5 |            2 | D         | 2017-11-04 18:05:00.453121+00
         2 |            3 | W         | 2017-10-31 19:00:00.802006+00
         1 |            3 | W         | 2017-10-31 19:00:00.740916+00
         1 |            5 | M         | 2017-10-23 19:40:00.540706+00
         2 |            5 | M         | 2017-10-23 19:40:00.535091+00
         1 |            4 | F         | 2017-10-23 19:20:00.614792+00
         2 |            4 | F         | 2017-10-23 19:20:00.592397+00
(9 rows)

To retain history, I would like to copy over everything missing from the new server.

The date from above, when the DNS changed, was Sun Nov 5 16:53:23.

That corresponds nicely to the production data. This increases confidence in what I am about to do.

What’s missing from prod?

On the old server, what is newer than what’s on prod?

freshports.org=# SELECT COUNT(*) FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00';
 count 
-------
  1072
(1 row)

Let’s check the boundary condition there, and see if I can safely use > without missing any data:

freshports.org=# SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC LIMIT 10;
  id   | report_id | frequency_id |          report_date          | email_count | commit_count | port_count 
-------+-----------+--------------+-------------------------------+-------------+--------------+------------
 19073 |         1 |            2 | 2017-11-04 18:10:00.440969+00 |         123 |          396 |        435
 19074 |         5 |            2 | 2018-01-05 18:06:15.429902+00 |           1 |            2 |         88
 19076 |         1 |            2 | 2018-01-05 18:11:15.513286+00 |         459 |        18287 |      18829
 19075 |         2 |            2 | 2018-01-05 18:11:15.515059+00 |         118 |            0 |      44840
 19077 |         5 |            2 | 2018-01-06 18:06:15.501673+00 |           0 |            0 |          0
 19079 |         1 |            2 | 2018-01-06 18:11:15.68547+00  |         153 |          323 |        354
 19078 |         2 |            2 | 2018-01-06 18:11:15.699523+00 |         118 |            0 |        118
 19080 |         5 |            2 | 2018-01-07 18:05:00.545592+00 |           0 |            0 |          0
 19082 |         1 |            2 | 2018-01-07 18:10:00.820374+00 |         114 |          203 |        246
 19081 |         2 |            2 | 2018-01-07 18:10:00.830676+00 |         118 |            0 |        118
(10 rows)

freshports.org=# 

Yes, I can avoid duplicates easily by changing to > instead of >=.

I am confident I can just dump this query to a file, transfer it to the new server, and load it up without overwriting any data.

Let’s do that.

PostgreSQL Copy

Both the production and the old server are using PostgreSQL 11.1 but that should not matter much when using the COPY command. On the other hand, with pg_dump, you should always use pg_dump from the destination version.

Let’s do this!

freshports.org=# COPY (SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC) TO '/usr/home/dan/missing-reports.copy';
ERROR:  could not open file "/usr/home/dan/missing-reports.copy" for writing: Permission denied
HINT:  COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.
freshports.org=#

Ahh, PostgreSQL itself, the database server cannot write to my home directory. I have sudo privileges, so I could get around this easily:

$ touch /usr/home/dan/missing-reports.copy
$ sudo chown postgres /usr/home/dan/missing-reports.copy

After running the above command, we have:

$ head -2 missing-reports.copy 
19073	1	2	2017-11-04 18:10:00.440969+00	123	396	435
19074	5	2	2018-01-05 18:06:15.429902+00	1	2	88

Not everyone can do this, especially since I am on the PostgreSQL server with sudo access.

Instead, let’s try the \copy command.

freshports.org=# \copy (SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC) TO '/usr/home/dan/missing-reports2.copy';
COPY 1072

What did we get?

$ head -2 missing-reports2.copy
19073	1	2	2017-11-04 18:10:00.440969+00	123	396	435
19074	5	2	2018-01-05 18:06:15.429902+00	1	2	88
$ diff -ruN missing-reports.copy missing-reports2.copy 
$

Yes, that looks right and that diff confirms I have the same data as with the COPY command.

Loading up the \copy data

The transfer of the file from one server to another has been done.

Over on the production server, I have taken a backup.

freshports.org=# begin;
BEGIN
freshports.org=# \copy report_log FROM '~/missing-reports2.copy';
ERROR:  duplicate key value violates unique constraint "report_log_pkey"
DETAIL:  Key (id)=(19073) already exists.
CONTEXT:  COPY report_log, line 1
freshports.org=#

What?

Oh. Look back. I used >= instead of > as I planned to do.

The proper thing to do would be to run the query again, but I am confident that deleting that one line will be the same result. I took a copy of the local file and used vi to delete the first line, which contained the 19073 key.

Let’s try that again:

freshports.org=# begin;
BEGIN
freshports.org=# \copy report_log FROM '~/missing-reports2.copy';
COPY 1071
freshports.org=# 

That is the expected number of records, 1071.

Let’s check the latest sent dates:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         2 |            4 | F         | 2018-11-09 19:20:00.456444+00
         1 |            4 | F         | 2018-11-09 19:20:00.434312+00
         2 |            2 | D         | 2018-11-09 18:10:00.431505+00
         1 |            2 | D         | 2018-11-09 18:10:00.411285+00
         5 |            2 | D         | 2018-11-09 18:05:00.407087+00
         1 |            3 | W         | 2018-11-06 19:00:01.348784+00
         2 |            3 | W         | 2018-11-06 19:00:00.497571+00
         1 |            5 | M         | 2018-10-23 19:40:00.677857+00
         2 |            5 | M         | 2018-10-23 19:40:00.426695+00
(9 rows)

Yes, that’s precisely what we had on the old server.

One final check:

freshports.org=# SELECT count(*) FROM report_log;
 count 
-------
 19863
(1 row)

Yes, that matches the old server (I have not shown that verification here).

Time to complete this:

freshports.org=# commit;
COMMIT
freshports.org=# 

Done.

Thank you for flying Air FreshPorts.

Dec 272018
 

I originally documented this so I would have a reference while tracking down the problem. I have since solved the issue and you can jump to the bottom of this post to find the solution. What I do not know is why this was not an issue under PostgreSQL 10.6 but was under PostgreSQL 11.1.

How did the issue arise

FreshPorts deals with the FreeBSD package tree. One of the things it does is replicate the directory tree for the directories and files. Yesterday, I moved my development database from a PostgreSQL 10.6 server to a PostgreSQL 11.1 server. Now I am getting errors on commits which add a new file to the tree. I think I’ve narrowed it down to this trigger:

CREATE OR REPLACE FUNCTION element_pathname_insert() RETURNS TRIGGER AS $$
   BEGIN
      INSERT INTO element_pathname (element_id, pathname)
         VALUES (NEW.id, element_pathname(NEW.id));
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS element_pathname_insert ON element;
CREATE TRIGGER element_pathname_insert
    AFTER INSERT on element
    FOR EACH ROW
    EXECUTE PROCEDURE element_pathname_insert();

This trigger is invoked each time a new row is added to the element table.

This is where some background may be useful.

element

element is a self referential table, and implements a directory & file structure.

freshports.dev=# \d element
                                     Table "public.element"
       Column        |     Type     | Collation | Nullable |               Default               
---------------------+--------------+-----------+----------+-------------------------------------
 id                  | integer      |           | not null | nextval('element_id_seq'::regclass)
 name                | text         |           | not null | 
 parent_id           | integer      |           |          | 
 directory_file_flag | character(1) |           | not null | 
 status              | character(1) |           | not null | 
Indexes:
    "element_pkey" PRIMARY KEY, btree (id)
    "element_delete" btree (status) WHERE status = 'D'::bpchar
    "element_name" btree (name)
    "element_parent_id" btree (parent_id)
Check constraints:
    "element_directory_file_flag" CHECK (directory_file_flag = 'F'::bpchar OR directory_file_flag = 'D'::bpchar)
    "element_status" CHECK (status = 'A'::bpchar OR status = 'D'::bpchar)
Foreign-key constraints:
    "$1" FOREIGN KEY (parent_id) REFERENCES element(id) ON UPDATE RESTRICT ON DELETE CASCADE

For simplicity, the the Referenced by and Triggers definitions have been omitted.

Given a pathname, we can find the element id:

freshports.dev=# select Pathname_ID('/ports/head/sysutils/bacula-server/Makefile');
 pathname_id 
-------------
      205195
(1 row)

Given an id, we can find the pathname:

freshports.dev=# select element_pathname(205195);
              element_pathname               
---------------------------------------------
 /ports/head/sysutils/bacula-server/Makefile
(1 row)

For historical reasons, element_pathname() is implemented with a WHILE … LOOP structure. I am sure it could be updated to use the element_pathname table.

element_pathname

The element_pathname table is a relatively recent addition, whereas element_pathname() dates back about 17 years. The table was added to make some of the more complex queries easier to construct.

The problem

The problem arises when a commit adds a new file to the tree. The code first checks to see if the pathname already exists in the tree:

freshports.dev=# select Pathname_ID('/base/releng/12.0/release/tools/gce.conf');
 pathname_id 
-------------
            
(1 row)

freshports.dev=# 

It finds nothing, so it adds:

freshports.dev=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status)
                            values(element_id, element_parent_id, 'D', element_name, 'A')"
PL/pgSQL function element_add(text,character) line 89 at SQL statement

This is a conflict in the element_pathname table.

I suspect the reason is that element_pathname(NEW.id) is returning an empty string because it cannot yet see the newly inserted value in the element table.

I have not supplied the contents of Element_Add() because it is about 100 lines of code.

This ’empty string’ theory is supported by finding this:

freshports.dev=# select * from element_pathname where pathname = '';
 element_id | pathname 
------------+----------
     910588 | 
(1 row)

What file is that?

freshports.dev=# select element_pathname(910588);
               element_pathname                
-----------------------------------------------
 /base/head/bin/sh/tests/expansion/question2.0
(1 row)

Checking the logs, this new file was recently added, and I suspect, but have not confirmed, that it was the first addition to the tree after moving to PostgreSQL 11.1.

Sometimes it fails, but then it succeeds

I started doing some testing to see what happens and I created a test database for this very purpose. I tried a series of COMMIT INSERT ROLLBACK commands.

Let’s try that now:

[dan@pg03:~] $ psql FPtest
psql (11.1)
Type "help" for help.

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status) 
                            values(element_id, element_parent_id, element_directory_file_flag, element_name, 'A')"
PL/pgSQL function element_add(text,character) line 86 at SQL statement
FPtest=# rollback;
ROLLBACK

OK, let’s try it again, but with something I know will not exist.

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/DELETEME/1/2/3/4', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status)
                            values(element_id, element_parent_id, 'D', element_name, 'A')"
PL/pgSQL function element_add(text,character) line 89 at SQL statement
FPtest=# rollback;
ROLLBACK

Well, there is no reason for that to fail, yet it does. Let’s try this:

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/DELETEME/1', 'F');
 element_add 
-------------
      910593
(1 row)

Woah, that worked. Let’s try the original, or close to it:

FPtest=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
 element_add 
-------------
      910595
(1 row)

FPtest=# rollback;
ROLLBACK
FPtest=# \q

That worked too. What’s up with that?

The IRC help

I got onto IRC and asked what might be up. I included this gist and it was RhodiumToad who pointed out that I must have two functions called element_pathname().

Have a look at line 5 of the stored element_pathname_insert() trigger in that gist (originally presented at the top of this post, but included here so you don’t have to scroll back):

CREATE OR REPLACE FUNCTION element_pathname_insert() RETURNS TRIGGER AS $$
   BEGIN
      RAISE WARNING 'element_pathname_insert says: %', NEW.id;
      INSERT INTO element_pathname (element_id, pathname)
         VALUES (NEW.id, element_pathname(NEW.id));
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

You might notice that this version of the function has a NOTICE added to it. I was trying many things to track this down.

Looking at the output of df I did find the second function:

freshports.org=# \df Element_Pathname
                              List of functions
 Schema |       Name       | Result data type | Argument data types |  Type  
--------+------------------+------------------+---------------------+--------
 public | element_pathname | text             | integer             | normal
 public | element_pathname | text             | integer, boolean    | normal
(2 rows)

df+ showed me the differences, reproduced here, side by side to make it easier to spot the differences. I have added whitespace to make similar code line up.

CREATE FUNCTION public.element_pathname(integer) RETURNS text
    LANGUAGE plpgsql IMMUTABLE
    AS $_$
   DECLARE
      element_id   ALIAS FOR $1;

      my_parent_id int4;
      element_name text;
      pathname     text;

begin
  pathname = '';
  select name, parent_id
    into element_name, my_parent_id 
    from element
   where id = element_id;

  IF FOUND THEN
    pathname := '/' || element_name || pathname;
    WHILE FOUND LOOP
      select name, parent_id 
        into element_name, my_parent_id
        from element
       where id = my_parent_id;

      IF FOUND THEN


        pathname = '/' || element_name || pathname;


      END IF;





    END LOOP;
  END IF;

  return pathname;
END;
$_$;
CREATE FUNCTION public.element_pathname(integer, boolean) RETURNS text
    LANGUAGE plpgsql STABLE
    AS $_$
   DECLARE
      element_id    ALIAS FOR $1;
      leading_slash ALIAS for $2;
      my_parent_id  int4;
      element_name  text;
      pathname      text;

begin
  pathname = '';
  select name, parent_id
    into element_name, my_parent_id
    from element
   where id = element_id;

  IF FOUND THEN
    pathname := '/' || element_name || pathname;
    WHILE FOUND LOOP
      select name, parent_id
        into element_name, my_parent_id
        from element
       where id = my_parent_id;

      IF FOUND THEN
         IF my_parent_id IS NULL THEN
           IF leading_slash THEN
              pathname = '/' || element_name || pathname;
           ELSE
              pathname = element_name || pathname;
           END IF;
           EXIT;
         ELSE
           pathname = '/' || element_name || pathname;
         END IF;
      END IF;
    END LOOP;
  END IF;

  return pathname;
END;
$_$;

These two functions are very similar in both purpose and structure, the primary difference: having a leading slash or not.

SIDE NOTE: whenever I code this I always consider whether or not to just remove the leading slash at the end of the code instead of testing a condition each time through the loop. The current code is plenty fast.

What stood out when comparing the two functions?

When I looked at the two functions, I noticed the one being invoked by the trigger was declared to be IMMUTABLE, which is defined as:

An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever.

This function does not modify the database, but it could return different results if the path was modified. The path does not get modified by these functions. Any modification would be external to what we are doing here.

Changed to STABLE

I changed the function to STABLE and tried my tests again.

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement.

This is definitely true for this function.

Defined as STABLE, this problem went away.

Why 11.1?

I do not know why this function is OK under PostgreSQL 10.6, but causes a problem under 11.1 but I can speculate. I am also sure I will be corrected.

I suspect 11.1 is better at handling immutable functions and started doing optimizations which 10 did not.

Or you could take the view that it got worse.

Yes, I know the function should not have been immutable at all, but I would like to know the explanation behind the change in behavior.

Further function changes

Those two functions were similar, but can be combined. I reduced the element_pathname (integer) function to merely invoke element_pathname (integer, boolean):

CREATE FUNCTION public.element_pathname(integer) RETURNS text
    LANGUAGE plpgsql STABLE
    AS $_$
   DECLARE
      element_id    ALIAS FOR $1;
BEGIN
  return Element_Pathname(element_id, TRUE);
END;
$_$;
Sep 262018
 

This post serves two goals:

  1. document the CONFLICTS implementation within FreshPorts
  2. show how you can use triggers to update a secondary table

The FreeBSD ports tree / package collection has a method for resolving conflicts. This feature allows a port maintainer to specify what, if any, packages conflict with a given port. The reasons why two ports might conflict is outside the scope of this article. For background reading:

Obtaining the values from the Makefile

To get started, let’s look at what I already have in the development instance of the FreshPorts database.

freshports.dev=# SELECT id, packagename(id), conflicts, conflicts_build, conflicts_install FROM ports WHERE coalesce(conflicts, conflicts_build, conflicts_install) IS NOT NULL and packagename(id) like 'bacula%' order by last_commit_id desc limit 10
;
  id   |      packagename      |            conflicts             | conflicts_build | conflicts_install 
-------+-----------------------+----------------------------------+-----------------+-------------------
 45084 | bacula9-docs          | bacula-* bacula5-*               |                 | 
 45086 | bacula9-server        | bacula5-server-* bacula-server-* |                 | 
 45087 | bacula9-bat           | bacula5-server-* bacula-server-* |                 | 
 45083 | bacula9-client        | bacula5-server-* bacula-server-* |                 | 
 45085 | bacula9-client-static | bacula5-server-* bacula-server-* |                 | 
(5 rows)

freshports.dev=# 

The packagename corresponds to the following FreeBSD ports:

  1. bacula9-docs
  2. bacula9-server
  3. bacula9-bat
  4. bacula9-client
  5. bacula9-client-static

In this case, bacula9 ports conflict with older versions of Bacula. How can we get those values from the Makefile? Let’s get the values for bacula9-server. Assume these commands are done in the /var/db/freshports/ports-jail/var/db/repos/PORTS-head/sysutils/bacula9-server directory, which is a non-standard location for ports files, but it suites the needs of the website.

$ grep CONFLICTS *
Makefile:CONFLICTS?=	bacula5-server-* bacula-server-*

NOTE: grep is not an appropriate method for obtaining the values because the values may be contingent upon other configuration settings.

This is the proper method:

$ make -V CONFLICTS -V CONFLICTS_BUILD -V CONFLICTS_INSTALL
bacula5-server-* bacula-server-*

NOTE: those two blank lines correspond to the empty values contained in those files.

globs not regex

CONFLICTS can involve wildcards, such as the * shown above. It is critical to realize that these are globs, meant for matching disk filenames, as opposed to regular expressions.

From the documentation

# CONFLICTS             - A list of package name patterns that the port conflicts
#                                 with, separated by blanks.  The names may include shell
#                                 pattern meta-characters "*", "?", "[", "]", and "!".
#                                 Example: apache*-1.2* apache*-1.3.[012345] apache-*+ssl_*

PostgreSQL has many regex functions. Let’s see about converting globs to regex.

glob to regex

Perl to the rescue. I received some help, as many people do, from RhodiumToad, who supplied this function:

CREATE TRUSTED PROCEDURAL LANGUAGE plperl
    HANDLER plperl_call_handler
    INLINE plperl_inline_handler
    VALIDATOR plperl_validator;
    
COMMENT ON LANGUAGE plperl
    IS 'PL/Perl procedural language';

CREATE FUNCTION public.glob_to_regex(text) RETURNS text
    LANGUAGE plperl
    AS $_$
    my $w = quotemeta shift;
    $w =~ s{\\([*?]|\\(\\?.)|\[(?:\\\!)?(?:\\\])?[^]]*\\\])}
                   {($1 eq '*' ? '.*' :
                         $1 eq '?' ? '.'  :
                         defined($2) ? $2 :
                         do { my $p = $1; $p =~ s/\\([][^-])/$1/g; $p =~ s/^\[\\!/[^/; $p; })}eg;
    return $w;
$_$;

On FreeBSD, I needed to add the databases/postgresql10-server package to provide support for Perl language extensions.

This will take a glob and translate it into a regular expression.

Here are a few examples:

freshports.dev=# select glob_to_regex('bacula-*');
 glob_to_regex 
---------------
 bacula\-.*
(1 row)

freshports.dev=# 


freshports.dev=# select glob_to_regex('postgresql9[^4]*');
   glob_to_regex   
-------------------
 postgresql9[^4].*
(1 row)

freshports.dev=# select glob_to_regex('clang-devel-3.[1234567]*');
        glob_to_regex         
------------------------------
 clang\-devel\-3\.[1234567].*
(1 row)

freshports.dev=# 

The secondary table

As part of earlier work, the ports table was upgraded to add three new fields:

ALTER TABLE ports ADD COLUMN conflicts text;
ALTER TABLE ports ADD COLUMN conflicts_build text;
ALTER TABLE ports ADD COLUMN conflicts_install text;

These columns will store the raw values as obtained directly from the Makefile. The corresponding regex values will be store, one per row, in another table. That table looks like this:

CREATE TABLE ports_conflicts
(
    id BIGSERIAL,
    port_id integer          NOT NULL,
    conflicts_type conflicts NOT NULL,
    item_num integer         NOT NULL,
    item_value text          NOT NULL,
    CONSTRAINT ports_conflicts_pkey PRIMARY KEY (id),
    CONSTRAINT ports_conflicts_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

What’s that column type on line 5 you might ask.

CREATE TYPE public.conflicts AS ENUM
    ('conflicts', 'conflicts_build', 'conflicts_install');

The item_num column will help to order the values so they correspond to what we find in the the source value. Feedback regarding the accuracy of that claim is most welcome. The ordering is not vital, but it does help to make sure it is consistent when we pull back the values later.

The Trigger

This is the trigger which will populate the ports_conflicts table.

CREATE OR REPLACE FUNCTION ports_conflicts_set() RETURNS TRIGGER AS $$
  BEGIN
    DELETE FROM ports_conflicts WHERE port_id = NEW.id;

    INSERT INTO ports_conflicts(port_id, conflicts_type, item_num, item_value)
      SELECT NEW.id                         AS port_id, 
             'conflicts'::conflicts         AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_build'::conflicts   AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_build, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_install'::conflicts AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_install, E'\\s+') AS R(item_value);

    RETURN NEW;
  END
$$ LANGUAGE 'plpgsql';


  DROP TRIGGER IF EXISTS ports_conflicts_insert ON ports;
CREATE TRIGGER ports_conflicts_insert
    AFTER INSERT on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

  DROP TRIGGER IF EXISTS ports_conflicts_update ON ports;
CREATE TRIGGER ports_conflicts_update
    AFTER UPDATE on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

Line 3 : deletes existing entries for this port

Lines 6-10 : compiles a list of conflicts values

Lines 14-18 : does the same thing for conflicts_build values

Lines 22-26 : same again, for conflicts_install

The UNIONs tie them all together and line 5 puts it into the ports_conflicts table.

Details on the trigger

This section explains the trigger in a bit more detail.

The regexp_split_to_table function is a wonderful solution to break each CONFLICTS value into separate rows, like this:

freshports.dev=# select regexp_split_to_table('bacula5-server-* bacula-server-*',  E'\\s+');
 regexp_split_to_table 
-----------------------
 bacula5-server-*
 bacula-server-*
(2 rows)

freshports.dev=# 

The glob_to_regex(R.item_value) call changes each of the above items from a blog to a regex.

row_number() OVER (ORDER BY item_value) will sort individual value within a conflict and give us a value for the ports_conflicts.item_num field.

Matching against the regex

The list of conflicts for a given port will not change until the next commit for a port.

However, what that conflict matches against might change when another port is updated.

Thus, the ports_conflicts table only updates based upon a trigger on the ports table.

We will store the actual matches in the ports_conflicts_matches table.

CREATE TABLE ports_conflicts_matches
(
    ports_conflicts_id bigint NOT NULL,
    port_id integer NOT NULL,
    CONSTRAINT ports_conflicts_matches_pkey PRIMARY KEY (ports_conflicts_id, port_id),
    CONSTRAINT ports_conflicts_matches_conflicts_id FOREIGN KEY (ports_conflicts_id)
        REFERENCES public.ports_conflicts (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT ports_conflicts_matches_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

How do we populate that table? How can it be quickly done?

That is the part which took the longest time to figure out.

Simple population

Here are some sample CONFLICTS values from the early testing (as recorded in an older gist):

freshports.dev=# select * from ports_conflicts;
 id | port_id |  conflicts_type   | item_num |          item_value          
----+---------+-------------------+----------+------------------------------
 21 |   43574 | conflicts_install |        4 | mysqlwsrep.*
 22 |   43574 | conflicts_install |        3 | mysql[0-9].*\-server\-.*
 23 |   43574 | conflicts_install |        1 | mariadb10[013-9]\-server\-.*
 24 |   43574 | conflicts_install |        5 | percona[0-9].*\-server\-.*
 25 |   43574 | conflicts_install |        2 | mariadb5.*\-server\-.*
(5 rows)

freshports.dev=# 

Given the regex in ports_conflicts.item_value, let’s find the matching values from the commit_log_ports table. I can concatenate all the regex values into one search term. This approach works rather well:

freshports.dev=# explain analyse
freshports.dev-# SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
                                                                                     QUERY PLAN                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=19482.04..19482.68 rows=85 width=22) (actual time=497.597..497.856 rows=364 loops=1)
   ->  Sort  (cost=19482.04..19482.25 rows=85 width=22) (actual time=497.596..497.658 rows=852 loops=1)
         Sort Key: port_id, port_name_revision
         Sort Method: quicksort  Memory: 91kB
         ->  Gather  (cost=1000.00..19479.32 rows=85 width=22) (actual time=107.422..497.400 rows=852 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on commit_log_ports clp  (cost=0.00..18470.82 rows=35 width=22) (actual time=108.076..493.249 rows=284 loops=3)
                     Filter: (port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)'::text)
                     Rows Removed by Filter: 307469
 Planning time: 0.417 ms
 Execution time: 498.383 ms
(12 rows)

freshports.dev=# 

The results are something like this:

SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
 
  port_id |      port_name_revision       
---------+-------------------------------
   16965 | mysql51-server-5.1.11
   16965 | mysql51-server-5.1.12
   16965 | mysql51-server-5.1.14
   16965 | mysql51-server-5.1.15
   16965 | mysql51-server-5.1.16
   16965 | mysql51-server-5.1.17
   16965 | mysql51-server-5.1.18
   16965 | mysql51-server-5.1.19
   16965 | mysql51-server-5.1.20
   16965 | mysql51-server-5.1.21
   16965 | mysql51-server-5.1.22
...

Only a few rows of the results are shown above. The full list of matches is in that gist.

Yes, this works, however, that solution takes nearly half a second and caters for just set of one match. There must be a better way. If you keep reading in that gist, you will see the other methods I tried. I want to thank those on IRC who helped me find different ways to do this.

Next, I tried an SRF (Set Returning Function) and documented that in this gist. This one is fast; the test I ran taking 7.137 ms, but it did not scale, and exploded to 4825.074 ms when I tried multiple rows.

SRF within a function

My next attempt involved a plpgsql function which took only 13.180 ms, a huge speed increase. This turned out to be the solution I implemented.

Let’s review slightly using the final gist. My original function was:

CREATE OR REPLACE FUNCTION port_conflicts4(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  SELECT DISTINCT PC.id, CLP.port_id
    FROM commit_log_ports CLP JOIN ports_conflicts PC 
                              ON (PC.id = $1 AND CLP.port_name_revision ~ $2);
$$ LANGUAGE SQL STRICT;

Then I created a SRF using plpgsql, which is incredibly fast:

CREATE OR REPLACE FUNCTION port_conflicts(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  DECLARE
    a_ports_conflicts_id ALIAS FOR $1;
    a_regex              ALIAS FOR $2;

    l_regex text;
    r       ports_conflicts_matches%rowtype;
  BEGIN
    l_regex := '^(' || a_regex || ')';
    
    FOR r in SELECT distinct a_ports_conflicts_id, port_id FROM commit_log_ports WHERE port_name_revision ~ l_regex
    LOOP
      RETURN NEXT r;
    END LOOP;
  END;

$$ LANGUAGE 'plpgsql' STABLE;

Now let’s try that function out:

freshports.dev=# explain analyse SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts5(PC.id, '^' || PC.item_value || '$')) PC4;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.26..35052.76 rows=1751000 width=12) (actual time=0.594..2208.947 rows=10700 loops=1)
   ->  Seq Scan on ports_conflicts pc  (cost=0.00..32.51 rows=1751 width=26) (actual time=0.015..0.838 rows=1751 loops=1)
   ->  Function Scan on port_conflicts5  (cost=0.26..10.26 rows=1000 width=12) (actual time=1.258..1.259 rows=6 loops=1751)
 Planning time: 0.112 ms
 Execution time: 2209.598 ms
(5 rows)

freshports.dev=# 

2.2 seconds, to calculate conflicts for 1751 rows, resulting in 10700 rows. That’s fast enough. We could do that on a periodic basis if required.

From my personal development ethics, I don’t like the idea of having a function and a table with the same / very similar name (port_conflicts function vs ports_conflicts table). I might rename the function to find_matches_for_conflict. I think that better describes the purpose of the function.

Populate the destination table

freshports.dev=# insert into ports_conflicts_matches SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts(PC.id, '^' || PC.item_value || '$')) PC4;
INSERT 0 10700

Now let’s have a look at some of the results:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
freshports.dev-#   FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
freshports.dev-#  WHERE PC.port_id = 45086;
  packagename   |    ?column?    |     packagename     
----------------+----------------+---------------------
 bacula9-server | conflicts with | bacula5-server
 bacula9-server | conflicts with | bacula-server
 bacula9-server | conflicts with | bacula-server-devel
(3 rows)

freshports.dev=# 

You can find that magical 45086 value near the top of this post.

Let’s try openssl:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 954;
 packagename |    ?column?    |  packagename   
-------------+----------------+----------------
 openssl     | conflicts with | openssl111
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
(14 rows)

Or heimdal:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 934;
 packagename |    ?column?    | packagename  
-------------+----------------+--------------
 heimdal     | conflicts with | wu-ftpd
 heimdal     | conflicts with | wu-ftpd+ipv6
 heimdal     | conflicts with | srp
 heimdal     | conflicts with | krb5-maint
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-114
 heimdal     | conflicts with | krb5-111
 heimdal     | conflicts with | krb5-113
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-112
 heimdal     | conflicts with | krb5-116
 heimdal     | conflicts with | krb5-115
 heimdal     | conflicts with | krb4
(14 rows)

freshports.dev=# 

Now partially in production

The work is partially implemented in production. The idea came from this issue suggested by zi0r.

If you have a look at the Bacula port, you will find the following under Conflicts:

CONFLICTS:

  • bacula5-server-*
  • bacula-server-*

Future work

The next job is to automatically populate the ports_conflicts_matches table. I think that needs to be done on a periodic basis, perhaps every 10 minutes if there has been a recent commit.

Just thinking about that now:

  1. If port A conflicts with port B, then clearly port B conflicts with port A. I am not sure if there is a requirement to specify CONFLICTS in both ports, but that is something easily confirmed by the ports_conlicts table.
  2. I tried to think of commits which would not change ports_conflicts_matches. Since the matches can include PORTVERSION, any given commit which bumps that value can affect matches.
  3. Perhaps I could exclude commits to UPDATING, MOVED, or Mk, but they are relatively few compared to the rest.
  4. I think it’s just safer to recompute all matches periodically. There does not seem to be a safe and straight forward way to do it based upon a smaller unit of work.
Mar 112018
 

Databases use relational integrity to enforce expected situations. A common scenario is duplicates. Case in point, I present the port_dependencies table:

freshports.org=# \d port_dependencies
                    Table "public.port_dependencies"
         Column         |     Type     | Collation | Nullable | Default 
------------------------+--------------+-----------+----------+---------
 port_id                | integer      |           | not null | 
 port_id_dependent_upon | integer      |           | not null | 
 dependency_type        | character(1) |           | not null | 
Indexes:
    "port_dependencies_pkey" PRIMARY KEY, btree (port_id, port_id_dependent_upon, dependency_type)
Foreign-key constraints:
    "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
    port_dependencies_delete_clear_cache AFTER DELETE ON port_dependencies FOR EACH ROW EXECUTE PROCEDURE port_dependencies_delete_clear_cache()
    port_dependencies_insert_clear_cache AFTER INSERT ON port_dependencies FOR EACH ROW EXECUTE PROCEDURE port_dependencies_insert_clear_cache()

freshports.org=# 

For those not familiar with FreeBSD ports, each port (you could also refer to them as a package or application) can have zero or more dependencies. The FreshPorts database extracts and lists these dependencies for convenient viewing.

As you can see in the above table definition, for a specific (port_id), a given dependency type (port_id_dependent_upon, dependency_type) can only occur once. There is no reason for a port to be dependent upon something twice.

Or so you, or I, might think.

Then this happened:

Could not execute SQL SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result ... 
maybe invalid? ERROR:  duplicate key value violates unique constraint "port_dependencies_pkey"

It was part of this commit on net/ceph. Let’s find out where FreshPorts went wrong.

Clearly, something tried to record a duplicate dependency. The code doesn’t look for it, and did not detect the violation.

The log file

FreshPorts logs a lot of stuff, because it can. It has always been useful for situations such as this.

Here is what I found:

depends with this: 'libboost_python.so:devel/boost-python-libs@py27 libboost_python.so:devel/boost-python-libs
libboost_thread.so:devel/boost-libs libleveldb.so:databases/leveldb libnss3.so:security/nss
libcryptopp.so:security/cryptopp libsnappy.so:archivers/snappy libcurl.so:ftp/curl libxml2.so:textproc/libxml2
libexpat.so:textproc/expat2 liblz4.so:archivers/liblz4 libplds4.so:devel/nspr libtcmalloc.so:devel/google-perftools
libfuse.so:sysutils/fusefs-libs libintl.so:devel/gettext-runtime libldap-2.4.so.2:net/openldap24-client'

The next entry referred to:

The 'L' depends are: devel/boost-python-libs@py27 - devel/boost-python-libs - devel/boost-libs - 
databases/leveldb - security/nss - security/cryptopp - archivers/snappy - ftp/curl - textproc/libxml2 - 
textproc/expat2 - archivers/liblz4 - devel/nspr - devel/google-perftools - sysutils/fusefs-libs -
devel/gettext-runtime - net/openldap24-client

Then the code starts processing them, one by one:

adding in devel/boost-python-libs@py27 which converts to 'devel/boost-python-libs'
sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
result is 1

As you can see, it strips the flavor (@py27) from the dependency. I think I know where this is going.

And then the next one:

adding in devel/boost-python-libs which converts to 'devel/boost-python-libs'
sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
Could not execute SQL SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result ...
maybe invalid? ERROR:  duplicate key value violates unique constraint "port_dependencies_pkey"
DETAIL:  Key (port_id, port_id_dependent_upon, dependency_type)=(45100, 25346, L) already exists.
CONTEXT:  SQL statement "INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
            VALUES (l_PortID, l_PortIDDependency, p_DependencyType)"
PL/pgSQL function portsdependenciesadd(text,text,text) line 19 at SQL statement
that failed

Ahh, OK, that’s why. I see now.

Let’s go to the source.

The source

This isn’t the source, it’s the Makefile from that commit. Looking at lines 25-28 we see:

25	LIB_DEPENDS=    \
26	        ${PY_BOOST} \
27	        libboost_python.so:devel/boost-python-libs \
28	        libboost_thread.so:devel/boost-libs \

I bet it’s coming from here:

$ make -V PY_BOOST
libboost_python.so:devel/boost-python-libs@py27

Yes, that’s it. Now we know where and why the duplicates are arising.

How to fix this

I could take one of two approaches:

  1. Detect and eliminate duplicates within the code.
  2. Within the database, ignore attempts to add duplicates.

The database is wonderful at doing this. I could write code, and it might be buggy. I trust the database for this, much more than I trust my code. Plus, I’ll have less code to maintain.

I’ll take option #2.

The code

Here is the statement which inserts the tuple into the database:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
      VALUES (l_PortID, l_PortIDDependency, p_DependencyType);

There is no detection of duplicate rows.

In different situations, I have used this approach to detect duplicates during insertion:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
  SELECT l_PortID, l_PortIDDependency, p_DependencyType
   WHERE NOT EXISTS (SELECT port_id, port_id_dependent_upon, dependency_type
                       FROM port_dependencies
                      WHERE port_id                = l_PortID
                        AND port_id_dependent_upon = l_PortIDDependency
                        AND dependency_type        = p_DependencyType);

The above may be easier to follow if I do the variable substitution:

 INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
  SELECT 45100, 25346, 'L'
   WHERE NOT EXISTS (SELECT port_id, port_id_dependent_upon, dependency_type
                       FROM port_dependencies
                      WHERE port_id                = 45100
                        AND port_id_dependent_upon = 25346
                        AND dependency_type        = 'L');

That works well, exactly as expected. It does what I need.

But there is a better way

Then I posted on Twitter about this, with the above example. My cousin, Scott Walsh, mentioned ON CONFLICT. This feature arrived with PostgreSQL 9.5 (see docs), which was released in early 2016. I haven’t kept up. FreshPorts added dependency support in 2011.

I immediately tried this approach:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
freshports.org-#   SELECT 45100, 25346, 'L'
freshports.org-# ON CONFLICT DO NOTHING;
INSERT 0 0

That’s great.

Then I got to thinking… there are other conflicts which could occur, errors which should be brought to my attention. Foreign key violations come to mind.

Reading more of the documentation came up with this solution:

freshports.org=# INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
freshports.org-#   SELECT 45100, 25346, 'L'
freshports.org-# ON CONFLICT ON CONSTRAINT port_dependencies_pkey DO NOTHING;
INSERT 0 0

EDIT 2017.03.17

I see I’ve left my SELECT in that INSERT. The following is what I will use in production:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
    VALUES (l_PortID, l_PortIDDependency, p_DependencyType)
    ON CONFLICT ON CONSTRAINT port_dependencies_pkey DO NOTHING;

When running my first test of that code, this is what was found in the logs (two duplicate inserts, no failures):

sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/net/ceph
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/devel/boost-python-libs
result is 1
adding in devel/boost-python-libs which converts to 'devel/boost-python-libs'
sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/net/ceph
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/devel/boost-python-libs
result is 1

That’s exactly what I need. Phew.


This code deals specifically with one constraint, port_dependencies_pkey. If that constraint is violated, we have a duplicate key: do nothing, ignore it, as if nothing happened. Please move along, nothing to see here, thank you.

This, for my situation, is exactly what I want. Instead of removing duplicates in the front end code, I can let the database silently handle it. Score.

Thanks to those who helped me find this solution. I hope it helps you too. This is not a one-size-fits-all solution, but it might be applicable to your needs.

Jul 252016
 

FreshPorts uses a cache for every port page (e.g. https://www.freshports.org/sysutils/bacula-server/). When an update to a port occurs, we must clear the cache for that port. This sounds simple, and it is. The various ways in which updates occur complicates the situation.

This post is all about fixing one edge case: the addition or removal of a port dependency (e.g. RUN_DEPENDS, LIB_DEPENDS).

When port A depends on port B, this fact is listed on both pages, but adding/removing a dependency is not properly handled (see this bug). Tonight, I have coded a fix. I’d like it to be less complex, but it has been fixed.

The table

This is the table in question:

freshports.org=# \d port_dependencies
         Table "public.port_dependencies"
         Column         |     Type     | Modifiers 
------------------------+--------------+-----------
 port_id                | integer      | not null
 port_id_dependent_upon | integer      | not null
 dependency_type        | character(1) | not null
Indexes:
    "port_dependencies_pkey" PRIMARY KEY, btree (port_id, port_id_dependent_upon, dependency_type)
Foreign-key constraints:
    "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

Using the terminology from the first sentence, port A and port B are represented by port_id and port_id_dependent_upon respectively. Any new row or any deleted row must invalidate the cache for both port A and port B.

Keeping track of cache to clear

This is the table we use to clear the ports cache:

freshports.org=# \d cache_clearing_ports
                                     Table "public.cache_clearing_ports"
   Column   |            Type             |                             Modifiers                             
------------+-----------------------------+-------------------------------------------------------------------
 id         | integer                     | not null default nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     | not null
 category   | text                        | not null
 port       | text                        | not null
 date_added | timestamp without time zone | not null default now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

We add rows to this table to indicate that a given port must be cleared from the cache.

On INSERT

On an insert to the port_dependencies table, let’s do this:

CREATE OR REPLACE FUNCTION port_dependencies_insert_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_cache_clearing_ports_id   int8;
      l_port      text;
      l_category  text;
   BEGIN
        --
        -- This function handles the addition of a new dependency.
        -- yes, we need to clear the cache for both port_id and port_id_dependent_upon
        -- from the cache_clearing_ports.  I figure there is a shorter way to this but
        -- I cannot think it through right now.
        --

        IF TG_OP = 'INSERT' THEN
         -- handle port A (port_id)
         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = NEW.port_id;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.port_id;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (NEW.port_id, l_category, l_port);
          END IF;

         -- handle port B (port_id_dependent_upon)
         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = NEW.port_id_dependent_upon;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.port_id_dependent_upon;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (NEW.port_id_dependent_upon, l_category, l_port);
          END IF;

          NOTIFY port_updated;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS port_dependencies_insert_clear_cache ON port_dependencies;
CREATE TRIGGER port_dependencies_insert_clear_cache
    AFTER INSERT on port_dependencies
    FOR EACH ROW
    EXECUTE PROCEDURE port_dependencies_insert_clear_cache();

That function is rather long, and I know it can be simplified, but that’s not on tonight’s agenda. If you are so inclined, I am happy to hear your suggestions. Specifically, I think I can do an INSERT INTO … WHERE NOT EXISTS and do the SELECT & INSERT in one step, without the need for IF FOUND.

On DELETE

Here is the code for the DELETE.

CREATE OR REPLACE FUNCTION port_dependencies_delete_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_cache_clearing_ports_id   int8;
      l_port      text;
      l_category  text;
   BEGIN
        --
        -- This function handles the deletion of a existing dependency.
        -- yes, we need to clear the cache for both port_id and port_id_dependent_upon
        -- from the cache_clearing_ports.  I figure there is a shorter way to this but
        -- I cannot think it through right now.
        --

        IF TG_OP = 'DELETE' THEN
         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = OLD.port_id;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = OLD.port_id;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (OLD.port_id, l_category, l_port);
          END IF;

         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = OLD.port_id_dependent_upon;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = OLD.port_id_dependent_upon;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (OLD.port_id_dependent_upon, l_category, l_port);
          END IF;

          NOTIFY port_updated;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN OLD;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS port_dependencies_delete_clear_cache ON port_dependencies;
CREATE TRIGGER port_dependencies_delete_clear_cache
    AFTER DELETE on port_dependencies
    FOR EACH ROW
    EXECUTE PROCEDURE port_dependencies_delete_clear_cache();

It is nearly identical to the first function. Again, those of you who want to suggest improvements there, feel free.

I will run this in dev for a while and see how it goes. Initial testing has been promising.

NOTE: there are no updates to this table, by design. Instead when we see a new commit to a port, we do a DELETE from port_dependencies WHERE port_id = :a.

Jan 292014
 

Yesterday, I wrote about some great progress on getting FreshPorts to work with multiple repositories to cater for the branches of the FreeBSD ports tree. At the end of that post, I talked briefly about wanting global settings or variables for my stored procedures. Shortly after posting that, I found a solution in the form of a post by depesz.

A bit about the problem

The problem I encountered was one of context. Sometimes, you want to use trunk (or head) and sometimes you want to use a branch (e.g. RELENG_9_1_0). A function such as the one shown below, has no content. It has ‘head’ hardcoded (see line 8) because the FreeBSD ports tree had no branches when the function was first modified to cater for SVN (FreeBSD previously used CVS).

CREATE OR REPLACE FUNCTION GetPort(text) RETURNS int4 AS $$
   DECLARE
      category_port ALIAS for $1;
      pathname         text;
      port_element_id  int4;
      port_id          int4;
   BEGIN
      pathname        := '/ports/head/' || category_port;
      port_element_id := Pathname_ID(pathname);
      if port_element_id IS NOT NULL THEN
         select id
           into port_id
           from ports
          where element_id = port_element_id;
      END IF;
      return port_id;
   END;
$$ LANGUAGE 'plpgsql';

If you look at yesterday’s post, you’ll see this output:

freshports.org=# select id, name, category, element_id, element_pathname(element_id) from ports_active where name = 'spellathon';
  id   |    name    | category | element_id |               element_pathname                
-------+------------+----------+------------+-----------------------------------------------
 24964 | spellathon | games    |     324899 | /ports/head/games/spellathon
 34159 | spellathon | games    |     559872 | /ports/branches/RELENG_9_1_0/games/spellathon
(2 rows)
 
freshports.org=# 

We need some way to tell this function which branch we are using.

The solution

This is the amended function, which is working just fine, thank you depesz.

CREATE OR REPLACE FUNCTION GetPort(text) RETURNS int4 AS $$
   DECLARE
      category_port ALIAS for $1;
      pathname         text;
      port_element_id  int4;
      port_id          int4;
      l_branch         text;
   BEGIN
      l_branch := freshports_branch_get();

      IF l_branch = 'head' THEN
          pathname := '/ports/'          || l_branch || '/' || category_port;
      ELSE
          pathname := '/ports/branches/' || l_branch || '/' || category_port;
      END IF;

      port_element_id := Pathname_ID(pathname);
      IF port_element_id IS NOT NULL THEN
         SELECT id
           INTO port_id
           FROM ports  
          WHERE element_id = port_element_id;
      END IF;

      RETURN port_id;
   END;
$$ LANGUAGE 'plpgsql';

The new magic code appears on lines 9-15.

  • Line 9-10 calls a new function (shown later) which pulls back the branch.
  • The IF statement constructs the correct path, based on the differences between trunk and branches in the FreeBSD ports repository.

Show me it working!

I have coded the function to assume head if no branch is specified. That is the default in the current code. Here is what you get when you invoke GetPort without specifying a branch:

freshports.org=# select GetPort('games/spellathon');
 getport 
---------
   24964
(1 row)

freshports.org=# 

Next, I set the branch, and call the same function again:

freshports.org=# select freshports_branch_set('RELENG_9_1_0');
 freshports_branch_set 
-----------------------
 
(1 row)

freshports.org=# select GetPort('games/spellathon');
 getport 
---------
   34159
(1 row)

freshports.org=# 

And here we swap back again:

freshports.org=# select freshports_branch_set('head');
 freshports_branch_set 
-----------------------
 
(1 row)

freshports.org=# select GetPort('games/spellathon');
 getport 
---------
   24964
(1 row)

freshports.org=# 

I was pretty chuffed when I was able to get this working with a minimum of fuss. This bodes well for the rest of this project.

The rest of the code

In addition to the code provided in depesz’s post, I added the following helper functions, specific to FreshPorts.

CREATE OR REPLACE FUNCTION freshports_branch_set( TEXT ) RETURNS void as $$

   SELECT session_variables.set_value('branch', $1);

$$ language sql;


CREATE OR REPLACE FUNCTION freshports_branch_get() RETURNS TEXT as $$

DECLARE
    reply TEXT;

BEGIN

   reply := session_variables.get_value( 'branch' );

   IF reply IS NULL THEN
      reply := 'head';
   END IF;

   RETURN reply;

END;

$$ language plpgsql;

The first function lets my code set the branch. You saw me use that in the examples above. The second function is used by the GetPort() function to retrieve the branch it should use. By default, it will return ‘head’.

Let’s go!

This session variable stuff is pretty much what I was looking for last night. I’m very pleased that it has been so easy to add to my particular application. Hope it helps you. I’m looking forward to the rest of this upgrade.