Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Jan 262019
 

Ever since the new server went into production, sometime in 2017, the notification emails, which tell you what has changed, have been coming from the old server. I never did that changeover. The old server was still processing incoming commits, so the notifications were accurate. However, new accounts would not get notifications and changes to existing accounts would be ignored. The notifications were getting increasingly inaccurate for your settings.

This is far from ideal and not good at all.

What I have to do now is copy over the notification statistics.

When did this change occur

I was not sure when the new server went into production. I looked in old blog posts, and on social media, and failed to find out.

I resorted to DNS because every change is recorded in subversion, thanks to nsnotifyd (more information on how I did that via this post).

Finding that date took me a while, because I was searching for my name in the logs, I failed. Every commit for past 17 months has been done by the dns-notify user. Eventually I found it. I can confidently say the change over occurred at Sun Nov 5 16:53:23 2017 UTC.

This date will be used later in this post.

What needs to be moved over

The following is on the old server.

I wasn’t sure exactly what needed to be moved, but I knew the answer might be in the report-notification.pl scripts. Looking in there, I found this snippet:

                my $dbh = FreshPorts::Database::GetDBHandle();
                if ($dbh->{Active}) {

                        $sql = "select last_sent 
                                  from report_log_latest
                                 where frequency = '$Frequency'
                                   and report_id = $ReportID";
                        if ($FreshPorts::Config::ReportDebugging) {
                                print "sql is $sql\n";
                        }

                        $sth = $dbh->prepare($sql);
                        $sth->execute ||
                                        die "Could not execute SQL $sql ... maybe invalid";

                        @row=$sth->fetchrow_array;
                        my $last_sent = $row[0];

That’s it, I need entries from the report_log_latest table. What does that table look like?

freshports.org=# \d report_log_latest
                     View "public.report_log_latest"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 report_id    | integer                  |           |          | 
 frequency_id | integer                  |           |          | 
 frequency    | character(1)             |           |          | 
 last_sent    | timestamp with time zone |           |          | 

freshports.org=# 
 

Oh, it’s a view, not a table. Let’s look closer:

freshports.org=# \d+ report_log_latest
                                  View "public.report_log_latest"
    Column    |           Type           | Collation | Nullable | Default | Storage  | Description 
--------------+--------------------------+-----------+----------+---------+----------+-------------
 report_id    | integer                  |           |          |         | plain    | 
 frequency_id | integer                  |           |          |         | plain    | 
 frequency    | character(1)             |           |          |         | extended | 
 last_sent    | timestamp with time zone |           |          |         | plain    | 
View definition:
 SELECT report_log.report_id,
    report_log.frequency_id,
    report_frequency.frequency,
    max(report_log.report_date) AS last_sent
   FROM report_log,
    report_frequency
  WHERE report_log.frequency_id = report_frequency.id
  GROUP BY report_log.report_id, report_log.frequency_id, report_frequency.frequency;

freshports.org=# 

How long has it been since any reports were sent out:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         2 |            4 | F         | 2018-11-09 19:20:00.456444+00
         1 |            4 | F         | 2018-11-09 19:20:00.434312+00
         2 |            2 | D         | 2018-11-09 18:10:00.431505+00
         1 |            2 | D         | 2018-11-09 18:10:00.411285+00
         5 |            2 | D         | 2018-11-09 18:05:00.407087+00
         1 |            3 | W         | 2018-11-06 19:00:01.348784+00
         2 |            3 | W         | 2018-11-06 19:00:00.497571+00
         1 |            5 | M         | 2018-10-23 19:40:00.677857+00
         2 |            5 | M         | 2018-10-23 19:40:00.426695+00
(9 rows)

freshports.org=# 

OK, that’s about 11 weeks ago.

Let’s look at production

This is from the production server. Yes, the database name is the same. I think the database should be freshports.prod on this server. It helps that each of the databases is on a different server. At home, where they are all on one server, I have:

  1. freshports.dev
  2. freshports.stage
  3. freshports.test

On production, it has been 64 weeks since we took a copy of the old database and moved it to the new server:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         1 |            2 | D         | 2017-11-04 18:10:00.440969+00
         2 |            2 | D         | 2017-11-04 18:10:00.425991+00
         5 |            2 | D         | 2017-11-04 18:05:00.453121+00
         2 |            3 | W         | 2017-10-31 19:00:00.802006+00
         1 |            3 | W         | 2017-10-31 19:00:00.740916+00
         1 |            5 | M         | 2017-10-23 19:40:00.540706+00
         2 |            5 | M         | 2017-10-23 19:40:00.535091+00
         1 |            4 | F         | 2017-10-23 19:20:00.614792+00
         2 |            4 | F         | 2017-10-23 19:20:00.592397+00
(9 rows)

To retain history, I would like to copy over everything missing from the new server.

The date from above, when the DNS changed, was Sun Nov 5 16:53:23.

That corresponds nicely to the production data. This increases confidence in what I am about to do.

What’s missing from prod?

On the old server, what is newer than what’s on prod?

freshports.org=# SELECT COUNT(*) FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00';
 count 
-------
  1072
(1 row)

Let’s check the boundary condition there, and see if I can safely use > without missing any data:

freshports.org=# SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC LIMIT 10;
  id   | report_id | frequency_id |          report_date          | email_count | commit_count | port_count 
-------+-----------+--------------+-------------------------------+-------------+--------------+------------
 19073 |         1 |            2 | 2017-11-04 18:10:00.440969+00 |         123 |          396 |        435
 19074 |         5 |            2 | 2018-01-05 18:06:15.429902+00 |           1 |            2 |         88
 19076 |         1 |            2 | 2018-01-05 18:11:15.513286+00 |         459 |        18287 |      18829
 19075 |         2 |            2 | 2018-01-05 18:11:15.515059+00 |         118 |            0 |      44840
 19077 |         5 |            2 | 2018-01-06 18:06:15.501673+00 |           0 |            0 |          0
 19079 |         1 |            2 | 2018-01-06 18:11:15.68547+00  |         153 |          323 |        354
 19078 |         2 |            2 | 2018-01-06 18:11:15.699523+00 |         118 |            0 |        118
 19080 |         5 |            2 | 2018-01-07 18:05:00.545592+00 |           0 |            0 |          0
 19082 |         1 |            2 | 2018-01-07 18:10:00.820374+00 |         114 |          203 |        246
 19081 |         2 |            2 | 2018-01-07 18:10:00.830676+00 |         118 |            0 |        118
(10 rows)

freshports.org=# 

Yes, I can avoid duplicates easily by changing to > instead of >=.

I am confident I can just dump this query to a file, transfer it to the new server, and load it up without overwriting any data.

Let’s do that.

PostgreSQL Copy

Both the production and the old server are using PostgreSQL 11.1 but that should not matter much when using the COPY command. On the other hand, with pg_dump, you should always use pg_dump from the destination version.

Let’s do this!

freshports.org=# COPY (SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC) TO '/usr/home/dan/missing-reports.copy';
ERROR:  could not open file "/usr/home/dan/missing-reports.copy" for writing: Permission denied
HINT:  COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.
freshports.org=#

Ahh, PostgreSQL itself, the database server cannot write to my home directory. I have sudo privileges, so I could get around this easily:

$ touch /usr/home/dan/missing-reports.copy
$ sudo chown postgres /usr/home/dan/missing-reports.copy

After running the above command, we have:

$ head -2 missing-reports.copy 
19073	1	2	2017-11-04 18:10:00.440969+00	123	396	435
19074	5	2	2018-01-05 18:06:15.429902+00	1	2	88

Not everyone can do this, especially since I am on the PostgreSQL server with sudo access.

Instead, let’s try the \copy command.

freshports.org=# \copy (SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC) TO '/usr/home/dan/missing-reports2.copy';
COPY 1072

What did we get?

$ head -2 missing-reports2.copy
19073	1	2	2017-11-04 18:10:00.440969+00	123	396	435
19074	5	2	2018-01-05 18:06:15.429902+00	1	2	88
$ diff -ruN missing-reports.copy missing-reports2.copy 
$

Yes, that looks right and that diff confirms I have the same data as with the COPY command.

Loading up the \copy data

The transfer of the file from one server to another has been done.

Over on the production server, I have taken a backup.

freshports.org=# begin;
BEGIN
freshports.org=# \copy report_log FROM '~/missing-reports2.copy';
ERROR:  duplicate key value violates unique constraint "report_log_pkey"
DETAIL:  Key (id)=(19073) already exists.
CONTEXT:  COPY report_log, line 1
freshports.org=#

What?

Oh. Look back. I used >= instead of > as I planned to do.

The proper thing to do would be to run the query again, but I am confident that deleting that one line will be the same result. I took a copy of the local file and used vi to delete the first line, which contained the 19073 key.

Let’s try that again:

freshports.org=# begin;
BEGIN
freshports.org=# \copy report_log FROM '~/missing-reports2.copy';
COPY 1071
freshports.org=# 

That is the expected number of records, 1071.

Let’s check the latest sent dates:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         2 |            4 | F         | 2018-11-09 19:20:00.456444+00
         1 |            4 | F         | 2018-11-09 19:20:00.434312+00
         2 |            2 | D         | 2018-11-09 18:10:00.431505+00
         1 |            2 | D         | 2018-11-09 18:10:00.411285+00
         5 |            2 | D         | 2018-11-09 18:05:00.407087+00
         1 |            3 | W         | 2018-11-06 19:00:01.348784+00
         2 |            3 | W         | 2018-11-06 19:00:00.497571+00
         1 |            5 | M         | 2018-10-23 19:40:00.677857+00
         2 |            5 | M         | 2018-10-23 19:40:00.426695+00
(9 rows)

Yes, that’s precisely what we had on the old server.

One final check:

freshports.org=# SELECT count(*) FROM report_log;
 count 
-------
 19863
(1 row)

Yes, that matches the old server (I have not shown that verification here).

Time to complete this:

freshports.org=# commit;
COMMIT
freshports.org=# 

Done.

Thank you for flying Air FreshPorts.

Jan 042019
 

When the website enters maintenance mode, there are a few steps to ensure the database is not updated while we dump the data, but still allow the website to display pages.

Disable ingress commit processing

In the ingress host (for example, x8dtu-ingress01), stop the FreshPorts daemon:

[dan@x8dtu-ingress01:~] $ sudo svc -d /var/service/freshports/
[dan@x8dtu-ingress01:~] $ 

Ensure the scripts have finished, you should not see any perl:

[dan@x8dtu-ingress01:~] $ ps auwwx | grep perl
dan     73052  0.0  0.0  6660  2524  3  S+J  13:07    0:00.00 grep perl
[dan@x8dtu-ingress01:~] $ 

Set website maintenance mode

On the website host (e.g. x8dtu-nginx01), in /usr/local/etc/freshports/constants.local.php, ensure this constant is set to TRUE:

define('IN_MAINTENCE_MODE', false);

Maintenance mode can be verified by visiting:

  1. https://www.freshports.org/customize.php – should redirect to https://www.freshports.org/now-in-maintenance-mode.php
  2. Visiting https://www.freshports.org/–/status/ where you should see “We are in in maintenance mode”.

Disable logins

On the website host (e.g. x8dtu-nginx01), in /usr/local/etc/freshports/constants.local.php, ensure this constant is defined, the actual value does not mattter.:

define('NO_LOGIN', true);

Confirm logins are disabled by visiting the status page and the login page (which should redirect to the home page).

Reverting maintenance mode

To come back out of maintenance mode, revert your changes, as above.

Jan 042019
 

x8dtu contains two jails for database servers: x8dtu-pg01 and x8dtu-pg02. At any given time, one is the production database server, and contains the IP address 127.1.54.32.

Let’s assume x8dtu-pg02 is on PostgreSQL 10.6 and we are upgrading to PostgreSQL 11.1 on x8dtu-pg01

The new packages would be installed on x8dtu-pg01. pg_dump would be run on x8dtu-pg01 (so we are using the newer code for dumping, standard PostgreSQL procedure) and would connect to the other server to get the data. For example:

[postgres@x8dtu-pg01 /var/db/DUMPING]$ pg_dumpall --globals -h 127.1.0.203 > globals.sql
[postgres@x8dtu-pg01 /var/db/DUMPING]$ psql template1 < /var/db/DUMPING/globals.sql
[postgres@x8dtu-pg01 /var/db/DUMPING]$ time pg_dump -j 14 -Z0 -Fd -h 127.1.0.203 freshports.org -f freshports.dump                                                                 

real    10m14.185s
user    1m21.128s
sys     0m24.430s

We create the new database and restore the database:

[postgres@x8dtu-pg01 /var/db/DUMPING]$ createdb freshports.org -E SQL_ASCII -T template0                                                                                           
[postgres@x8dtu-pg01 /var/db/DUMPING]$ time pg_restore -j 14 -d freshports.org /var/db/DUMPING/freshports.dump                                                                     

real    27m32.382s
user    0m10.793s
sys     0m44.011s
[postgres@x8dtu-pg01 /var/db/DUMPING]$ 

NOTE: the following procedure is untested. When it was written, 127.1.54.32 was not in use by any jail. This procedure will be updated when the next database upgrade occurs.

Add the IP address as an alias on the host:

[dan@x8dtu:~] $ sudo ifconfig lo1 alias 127.1.54.32/32

This is typical:

[dan@x8dtu:~] $ ifconfig lo1
lo1: flags=8049 metric 0 mtu 16384
	options=600003
	inet6 fd00::201 prefixlen 8 
	inet6 fe80::1%lo1 prefixlen 64 scopeid 0x5 
	inet 127.1.0.200 netmask 0xffffffff 
	inet 127.1.0.201 netmask 0xffffffff 
	inet 127.1.0.202 netmask 0xffffffff 
	inet 127.1.0.203 netmask 0xffffffff 
	inet 127.1.0.204 netmask 0xffffffff 
	inet 127.1.54.32 netmask 0xffffffff 
	nd6 options=21
	groups: lo 
[dan@x8dtu:~] $ 

Once the database is ready, we need to move the IP address.

Get the existing IP addresses. We use iocage for managing jails.

[dan@x8dtu:~] $ iocage get ip4_addr x8dtu-pg01
10.100.0.200,127.1.0.200

Add the production IP address:

[dan@x8dtu:~] $ sudo iocage set ip4_addr="10.100.0.200,127.1.0.200,127.1.54.32" x8dtu-pg01
Property: ip4_addr has been updated to 10.100.0.200,127.1.0.200,127.1.54.32

The jail needs to be restarted to grab the new IP address:

[dan@x8dtu:~] $ sudo iocage restart x8dtu-pg01

In ansible, update postgresql_address in these files.

  1. host_vars/x8dtu-pg02.vpn.unixathome.org
  2. host_vars/x8dtu-pg01.vpn.unixathome.org

Deploy those changes:

ansible-playbook jail-postgresql.yml --limit=x8dtu-pg01.vpn.unixathome.org,x8dtu-pg02.vpn.unixathome.org --tags=postgresql.conf,pg_hba

Ensure PostgreSQL is listening on the right IP addresses:

[dan@x8dtu-pg01:~] $ sockstat -p 5432 -4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
postgres postgres   19935 3  tcp4   127.1.0.200:5432      *:*
postgres postgres   19935 5  tcp4   127.1.54.32:5432      *:*

Now that the database server is listening on the new addresses, we can adjust the clients to use that newly-assigned IP address. See Swapping database servers.

Jan 042019
 

This procedure is rarely used, but is documented nonetheless.

A database server change is usually accomplished by moving the IP address from one server to another, not by adjusting the clients. The production database server is usually at 127.1.54.32.

When changing from one database server to another, there are a few configuration files to update:

Ingress host

These files are in /usr/local/etc/freshports:

  • config.pm$FreshPorts::Config::host : should contain the IP address of the PostgreSQL database server.
  • config.shHOST : The IP address of the PostgreSQL server.
  • ~freshports/.pgpass – Usually does not require updating as it contains a wildcard for the hostname.

This needs to be restarted:

svc -t /var/service/freshports

Website host

These files are in /usr/local/etc/freshports:

  • config.shHOST : The IP address of the PostgreSQL server.
  • database.php – The pg_connect call contains a host=IP.ADD.RE.SS clause.
  • forums.php – Contains a $PHORUM[‘DatabaseServer’] entry.
  • fp-listen.ini – Contains a HOST declaration.

These files are in /usr/local/etc/freshsource:

  • database.php – The pg_connect call contains a host=IP.ADD.RE.SS clause.
  • forums.php – Contains a $PHORUM[‘DatabaseServer’] entry.
  • fp-listen.ini – Contains a HOST declaration.

This needs to be restarted:

svc -t /var/service/fp-listen
Dec 272018
 

I originally documented this so I would have a reference while tracking down the problem. I have since solved the issue and you can jump to the bottom of this post to find the solution. What I do not know is why this was not an issue under PostgreSQL 10.6 but was under PostgreSQL 11.1.

How did the issue arise

FreshPorts deals with the FreeBSD package tree. One of the things it does is replicate the directory tree for the directories and files. Yesterday, I moved my development database from a PostgreSQL 10.6 server to a PostgreSQL 11.1 server. Now I am getting errors on commits which add a new file to the tree. I think I’ve narrowed it down to this trigger:

CREATE OR REPLACE FUNCTION element_pathname_insert() RETURNS TRIGGER AS $$
   BEGIN
      INSERT INTO element_pathname (element_id, pathname)
         VALUES (NEW.id, element_pathname(NEW.id));
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS element_pathname_insert ON element;
CREATE TRIGGER element_pathname_insert
    AFTER INSERT on element
    FOR EACH ROW
    EXECUTE PROCEDURE element_pathname_insert();

This trigger is invoked each time a new row is added to the element table.

This is where some background may be useful.

element

element is a self referential table, and implements a directory & file structure.

freshports.dev=# \d element
                                     Table "public.element"
       Column        |     Type     | Collation | Nullable |               Default               
---------------------+--------------+-----------+----------+-------------------------------------
 id                  | integer      |           | not null | nextval('element_id_seq'::regclass)
 name                | text         |           | not null | 
 parent_id           | integer      |           |          | 
 directory_file_flag | character(1) |           | not null | 
 status              | character(1) |           | not null | 
Indexes:
    "element_pkey" PRIMARY KEY, btree (id)
    "element_delete" btree (status) WHERE status = 'D'::bpchar
    "element_name" btree (name)
    "element_parent_id" btree (parent_id)
Check constraints:
    "element_directory_file_flag" CHECK (directory_file_flag = 'F'::bpchar OR directory_file_flag = 'D'::bpchar)
    "element_status" CHECK (status = 'A'::bpchar OR status = 'D'::bpchar)
Foreign-key constraints:
    "$1" FOREIGN KEY (parent_id) REFERENCES element(id) ON UPDATE RESTRICT ON DELETE CASCADE

For simplicity, the the Referenced by and Triggers definitions have been omitted.

Given a pathname, we can find the element id:

freshports.dev=# select Pathname_ID('/ports/head/sysutils/bacula-server/Makefile');
 pathname_id 
-------------
      205195
(1 row)

Given an id, we can find the pathname:

freshports.dev=# select element_pathname(205195);
              element_pathname               
---------------------------------------------
 /ports/head/sysutils/bacula-server/Makefile
(1 row)

For historical reasons, element_pathname() is implemented with a WHILE … LOOP structure. I am sure it could be updated to use the element_pathname table.

element_pathname

The element_pathname table is a relatively recent addition, whereas element_pathname() dates back about 17 years. The table was added to make some of the more complex queries easier to construct.

The problem

The problem arises when a commit adds a new file to the tree. The code first checks to see if the pathname already exists in the tree:

freshports.dev=# select Pathname_ID('/base/releng/12.0/release/tools/gce.conf');
 pathname_id 
-------------
            
(1 row)

freshports.dev=# 

It finds nothing, so it adds:

freshports.dev=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status)
                            values(element_id, element_parent_id, 'D', element_name, 'A')"
PL/pgSQL function element_add(text,character) line 89 at SQL statement

This is a conflict in the element_pathname table.

I suspect the reason is that element_pathname(NEW.id) is returning an empty string because it cannot yet see the newly inserted value in the element table.

I have not supplied the contents of Element_Add() because it is about 100 lines of code.

This ’empty string’ theory is supported by finding this:

freshports.dev=# select * from element_pathname where pathname = '';
 element_id | pathname 
------------+----------
     910588 | 
(1 row)

What file is that?

freshports.dev=# select element_pathname(910588);
               element_pathname                
-----------------------------------------------
 /base/head/bin/sh/tests/expansion/question2.0
(1 row)

Checking the logs, this new file was recently added, and I suspect, but have not confirmed, that it was the first addition to the tree after moving to PostgreSQL 11.1.

Sometimes it fails, but then it succeeds

I started doing some testing to see what happens and I created a test database for this very purpose. I tried a series of COMMIT INSERT ROLLBACK commands.

Let’s try that now:

[dan@pg03:~] $ psql FPtest
psql (11.1)
Type "help" for help.

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status) 
                            values(element_id, element_parent_id, element_directory_file_flag, element_name, 'A')"
PL/pgSQL function element_add(text,character) line 86 at SQL statement
FPtest=# rollback;
ROLLBACK

OK, let’s try it again, but with something I know will not exist.

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/DELETEME/1/2/3/4', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status)
                            values(element_id, element_parent_id, 'D', element_name, 'A')"
PL/pgSQL function element_add(text,character) line 89 at SQL statement
FPtest=# rollback;
ROLLBACK

Well, there is no reason for that to fail, yet it does. Let’s try this:

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/DELETEME/1', 'F');
 element_add 
-------------
      910593
(1 row)

Woah, that worked. Let’s try the original, or close to it:

FPtest=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
 element_add 
-------------
      910595
(1 row)

FPtest=# rollback;
ROLLBACK
FPtest=# \q

That worked too. What’s up with that?

The IRC help

I got onto IRC and asked what might be up. I included this gist and it was RhodiumToad who pointed out that I must have two functions called element_pathname().

Have a look at line 5 of the stored element_pathname_insert() trigger in that gist (originally presented at the top of this post, but included here so you don’t have to scroll back):

CREATE OR REPLACE FUNCTION element_pathname_insert() RETURNS TRIGGER AS $$
   BEGIN
      RAISE WARNING 'element_pathname_insert says: %', NEW.id;
      INSERT INTO element_pathname (element_id, pathname)
         VALUES (NEW.id, element_pathname(NEW.id));
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

You might notice that this version of the function has a NOTICE added to it. I was trying many things to track this down.

Looking at the output of df I did find the second function:

freshports.org=# \df Element_Pathname
                              List of functions
 Schema |       Name       | Result data type | Argument data types |  Type  
--------+------------------+------------------+---------------------+--------
 public | element_pathname | text             | integer             | normal
 public | element_pathname | text             | integer, boolean    | normal
(2 rows)

df+ showed me the differences, reproduced here, side by side to make it easier to spot the differences. I have added whitespace to make similar code line up.

CREATE FUNCTION public.element_pathname(integer) RETURNS text
    LANGUAGE plpgsql IMMUTABLE
    AS $_$
   DECLARE
      element_id   ALIAS FOR $1;

      my_parent_id int4;
      element_name text;
      pathname     text;

begin
  pathname = '';
  select name, parent_id
    into element_name, my_parent_id 
    from element
   where id = element_id;

  IF FOUND THEN
    pathname := '/' || element_name || pathname;
    WHILE FOUND LOOP
      select name, parent_id 
        into element_name, my_parent_id
        from element
       where id = my_parent_id;

      IF FOUND THEN


        pathname = '/' || element_name || pathname;


      END IF;





    END LOOP;
  END IF;

  return pathname;
END;
$_$;
CREATE FUNCTION public.element_pathname(integer, boolean) RETURNS text
    LANGUAGE plpgsql STABLE
    AS $_$
   DECLARE
      element_id    ALIAS FOR $1;
      leading_slash ALIAS for $2;
      my_parent_id  int4;
      element_name  text;
      pathname      text;

begin
  pathname = '';
  select name, parent_id
    into element_name, my_parent_id
    from element
   where id = element_id;

  IF FOUND THEN
    pathname := '/' || element_name || pathname;
    WHILE FOUND LOOP
      select name, parent_id
        into element_name, my_parent_id
        from element
       where id = my_parent_id;

      IF FOUND THEN
         IF my_parent_id IS NULL THEN
           IF leading_slash THEN
              pathname = '/' || element_name || pathname;
           ELSE
              pathname = element_name || pathname;
           END IF;
           EXIT;
         ELSE
           pathname = '/' || element_name || pathname;
         END IF;
      END IF;
    END LOOP;
  END IF;

  return pathname;
END;
$_$;

These two functions are very similar in both purpose and structure, the primary difference: having a leading slash or not.

SIDE NOTE: whenever I code this I always consider whether or not to just remove the leading slash at the end of the code instead of testing a condition each time through the loop. The current code is plenty fast.

What stood out when comparing the two functions?

When I looked at the two functions, I noticed the one being invoked by the trigger was declared to be IMMUTABLE, which is defined as:

An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever.

This function does not modify the database, but it could return different results if the path was modified. The path does not get modified by these functions. Any modification would be external to what we are doing here.

Changed to STABLE

I changed the function to STABLE and tried my tests again.

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement.

This is definitely true for this function.

Defined as STABLE, this problem went away.

Why 11.1?

I do not know why this function is OK under PostgreSQL 10.6, but causes a problem under 11.1 but I can speculate. I am also sure I will be corrected.

I suspect 11.1 is better at handling immutable functions and started doing optimizations which 10 did not.

Or you could take the view that it got worse.

Yes, I know the function should not have been immutable at all, but I would like to know the explanation behind the change in behavior.

Further function changes

Those two functions were similar, but can be combined. I reduced the element_pathname (integer) function to merely invoke element_pathname (integer, boolean):

CREATE FUNCTION public.element_pathname(integer) RETURNS text
    LANGUAGE plpgsql STABLE
    AS $_$
   DECLARE
      element_id    ALIAS FOR $1;
BEGIN
  return Element_Pathname(element_id, TRUE);
END;
$_$;
Oct 062018
 

From IRC:

[Oct 05 22:41] <@dvl> zi: we have conflict matches: https://dev.freshports.org/security/openssl/
[Oct 06 09:08] <@zi> dvl: cute. possible to add in a clickable link to take you to that matching port? :)
[Oct 06 10:50] <@dvl> zi: That matching port, it’s just a package name, sometimes there is a matching port, sometimes there is not… But yeah, let’s see.

So let’s see what happens.

I started off with this:

freshports.dev=# select id, packagename(id), element_pathname(element_id) from ports where package_name = 'libressl';
  id   | packagename |             element_pathname             
-------+-------------+------------------------------------------
 48916 | libressl    | /ports/branches/2018Q2/security/libressl
 40767 | libressl    | /ports/branches/2017Q1/security/libressl
 43361 | libressl    | /ports/branches/2017Q2/security/libressl
 34863 | libressl    | /ports/head/security/libressl
(4 rows)

What about just head?

Let’s join to the element_pathname table and use LIKE to get what we need.

freshports.dev=# SELECT P.id, packagename(P.id), element_pathname(P.element_id)
freshports.dev-#   FROM ports P JOIN element_pathname EP ON P.element_id = EP.element_id
freshports.dev-#  WHERE package_name = 'libressl'
freshports.dev-#    AND EP.pathname LIKE '/ports/head/%';
  id   | packagename |       element_pathname        
-------+-------------+-------------------------------
 34863 | libressl    | /ports/head/security/libressl
(1 row)

That’s better. But we need category and port name too.

freshports.dev=# SELECT P.id AS port_id, C.name AS category, packagename(P.id) as port, element_pathname(P.element_id)
  FROM ports P JOIN element_pathname EP ON P.element_id  = EP.element_id
               JOIN categories       C  ON P.category_id = C.id
 WHERE package_name = 'libressl'
   AND EP.pathname LIKE '/ports/head/%';
 port_id | category |   port   |       element_pathname        
---------+----------+----------+-------------------------------
   34863 | security | libressl | /ports/head/security/libressl
(1 row)

That’s it. Let’s create a datatype for that:

  DROP TYPE IF EXISTS GetPortFromPackageName_record CASCADE;
CREATE TYPE GetPortFromPackageName_record AS (
        port_id                 integer,
        category                text,
        port                    text,
        element_pathname        text
);

And next, the function to get it all:

CREATE OR REPLACE FUNCTION GetPortFromPackageName(text) RETURNS SETOF GetPortFromPackageName_RECORD AS $$
DECLARE
        a_PackageName   ALIAS for $1;

        r               GetPortFromPackageName_RECORD%rowtype;

BEGIN
        FOR r IN
                SELECT P.id AS port_id, C.name AS category, packagename(P.id) as port, element_pathname(P.element_id)
                  FROM ports P JOIN element_pathname EP ON P.element_id  = EP.element_id
                               JOIN categories       C  ON P.category_id = C.id
                 WHERE package_name = a_PackageName
                   AND EP.pathname LIKE '/ports/head/%'

        LOOP
                RETURN NEXT r;
        END LOOP;
END $$ LANGUAGE plpgsql;

Now we can get that information via:

freshports.dev=# select * from GetPortFromPackageName('libressl');
 port_id | category |   port   |       element_pathname        
---------+----------+----------+-------------------------------
   34863 | security | libressl | /ports/head/security/libressl
(1 row)

freshports.dev=#

Next step: alter the website to use this data to create the hyperlink.

The existing website code was:

SELECT DISTINCT PackageName(PCM.port_id) as package_name
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = ' . $this ->{'id'};

The first attempt to use the new function was:

freshports.dev=# SELECT DISTINCT PackageName(PCM.port_id) as package_name, GetPortFromPackageName(PackageName(PCM.port_id))
freshports.dev-#   FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
freshports.dev-#  WHERE PC.port_id = 34863;
  package_name  |                       getportfrompackagename                        
----------------+---------------------------------------------------------------------
 libressl-devel | (37855,security,libressl-devel,/ports/head/security/libressl-devel)
 openssl        | (954,security,openssl,/ports/head/security/openssl)
 openssl-devel  | (38174,security,openssl-devel,/ports/head/security/openssl-devel)
 openssl111     | (49920,security,openssl111,/ports/head/security/openssl111)
(4 rows)

freshports.dev=# 

That’s not quite what I needed. I got on IRC and asked. xocolatl provided.

SELECT DISTINCT PackageName(PCM.port_id) as package_name, f.*
  FROM ports_conflicts_matches PCM 
  JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
  JOIN GetPortFromPackageName(PackageName(PCM.port_id)) AS f ON true
 WHERE PC.port_id = 34863;

That query is now on the dev website.

This is the corresponding change to the display part of the website:

@@ -663,6 +663,12 @@ class port_display {
                        }
 
                        $HTML .= "\n";
+
+                       $HTML .= "Conflicts Matches:\n
    "; + foreach($port->conflicts_matches as $match) { + $HTML .= "
  • conflicts with " . freshports_link_to_port($match['category'], $match['port']) . '
  • '; + } + $HTML .= '
'; }

You can see the results at https://dev.freshports.org/security/openssl/

Soon, this will go into production. Please review any other ports which you know have CONFLICTS.

Oct 042018
 

Today I updated the fp-listen and freshports daemons. These are the scripts which invalidate the cache and process incoming commits, respectively.

No code changes were made, just the installations directories.

Previously, this code when into /var/service but now they go into /usr/local/libexec/freshports-service and I create a symlink from /var/service as as required.

This confirms to the standard method described at https://cr.yp.to/daemontools/faq/create.html for adding/removing services.

Sep 262018
 

This post serves two goals:

  1. document the CONFLICTS implementation within FreshPorts
  2. show how you can use triggers to update a secondary table

The FreeBSD ports tree / package collection has a method for resolving conflicts. This feature allows a port maintainer to specify what, if any, packages conflict with a given port. The reasons why two ports might conflict is outside the scope of this article. For background reading:

Obtaining the values from the Makefile

To get started, let’s look at what I already have in the development instance of the FreshPorts database.

freshports.dev=# SELECT id, packagename(id), conflicts, conflicts_build, conflicts_install FROM ports WHERE coalesce(conflicts, conflicts_build, conflicts_install) IS NOT NULL and packagename(id) like 'bacula%' order by last_commit_id desc limit 10
;
  id   |      packagename      |            conflicts             | conflicts_build | conflicts_install 
-------+-----------------------+----------------------------------+-----------------+-------------------
 45084 | bacula9-docs          | bacula-* bacula5-*               |                 | 
 45086 | bacula9-server        | bacula5-server-* bacula-server-* |                 | 
 45087 | bacula9-bat           | bacula5-server-* bacula-server-* |                 | 
 45083 | bacula9-client        | bacula5-server-* bacula-server-* |                 | 
 45085 | bacula9-client-static | bacula5-server-* bacula-server-* |                 | 
(5 rows)

freshports.dev=# 

The packagename corresponds to the following FreeBSD ports:

  1. bacula9-docs
  2. bacula9-server
  3. bacula9-bat
  4. bacula9-client
  5. bacula9-client-static

In this case, bacula9 ports conflict with older versions of Bacula. How can we get those values from the Makefile? Let’s get the values for bacula9-server. Assume these commands are done in the /var/db/freshports/ports-jail/var/db/repos/PORTS-head/sysutils/bacula9-server directory, which is a non-standard location for ports files, but it suites the needs of the website.

$ grep CONFLICTS *
Makefile:CONFLICTS?=	bacula5-server-* bacula-server-*

NOTE: grep is not an appropriate method for obtaining the values because the values may be contingent upon other configuration settings.

This is the proper method:

$ make -V CONFLICTS -V CONFLICTS_BUILD -V CONFLICTS_INSTALL
bacula5-server-* bacula-server-*

NOTE: those two blank lines correspond to the empty values contained in those files.

globs not regex

CONFLICTS can involve wildcards, such as the * shown above. It is critical to realize that these are globs, meant for matching disk filenames, as opposed to regular expressions.

From the documentation

# CONFLICTS             - A list of package name patterns that the port conflicts
#                                 with, separated by blanks.  The names may include shell
#                                 pattern meta-characters "*", "?", "[", "]", and "!".
#                                 Example: apache*-1.2* apache*-1.3.[012345] apache-*+ssl_*

PostgreSQL has many regex functions. Let’s see about converting globs to regex.

glob to regex

Perl to the rescue. I received some help, as many people do, from RhodiumToad, who supplied this function:

CREATE TRUSTED PROCEDURAL LANGUAGE plperl
    HANDLER plperl_call_handler
    INLINE plperl_inline_handler
    VALIDATOR plperl_validator;
    
COMMENT ON LANGUAGE plperl
    IS 'PL/Perl procedural language';

CREATE FUNCTION public.glob_to_regex(text) RETURNS text
    LANGUAGE plperl
    AS $_$
    my $w = quotemeta shift;
    $w =~ s{\\([*?]|\\(\\?.)|\[(?:\\\!)?(?:\\\])?[^]]*\\\])}
                   {($1 eq '*' ? '.*' :
                         $1 eq '?' ? '.'  :
                         defined($2) ? $2 :
                         do { my $p = $1; $p =~ s/\\([][^-])/$1/g; $p =~ s/^\[\\!/[^/; $p; })}eg;
    return $w;
$_$;

On FreeBSD, I needed to add the databases/postgresql10-server package to provide support for Perl language extensions.

This will take a glob and translate it into a regular expression.

Here are a few examples:

freshports.dev=# select glob_to_regex('bacula-*');
 glob_to_regex 
---------------
 bacula\-.*
(1 row)

freshports.dev=# 


freshports.dev=# select glob_to_regex('postgresql9[^4]*');
   glob_to_regex   
-------------------
 postgresql9[^4].*
(1 row)

freshports.dev=# select glob_to_regex('clang-devel-3.[1234567]*');
        glob_to_regex         
------------------------------
 clang\-devel\-3\.[1234567].*
(1 row)

freshports.dev=# 

The secondary table

As part of earlier work, the ports table was upgraded to add three new fields:

ALTER TABLE ports ADD COLUMN conflicts text;
ALTER TABLE ports ADD COLUMN conflicts_build text;
ALTER TABLE ports ADD COLUMN conflicts_install text;

These columns will store the raw values as obtained directly from the Makefile. The corresponding regex values will be store, one per row, in another table. That table looks like this:

CREATE TABLE ports_conflicts
(
    id BIGSERIAL,
    port_id integer          NOT NULL,
    conflicts_type conflicts NOT NULL,
    item_num integer         NOT NULL,
    item_value text          NOT NULL,
    CONSTRAINT ports_conflicts_pkey PRIMARY KEY (id),
    CONSTRAINT ports_conflicts_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

What’s that column type on line 5 you might ask.

CREATE TYPE public.conflicts AS ENUM
    ('conflicts', 'conflicts_build', 'conflicts_install');

The item_num column will help to order the values so they correspond to what we find in the the source value. Feedback regarding the accuracy of that claim is most welcome. The ordering is not vital, but it does help to make sure it is consistent when we pull back the values later.

The Trigger

This is the trigger which will populate the ports_conflicts table.

CREATE OR REPLACE FUNCTION ports_conflicts_set() RETURNS TRIGGER AS $$
  BEGIN
    DELETE FROM ports_conflicts WHERE port_id = NEW.id;

    INSERT INTO ports_conflicts(port_id, conflicts_type, item_num, item_value)
      SELECT NEW.id                         AS port_id, 
             'conflicts'::conflicts         AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_build'::conflicts   AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_build, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_install'::conflicts AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_install, E'\\s+') AS R(item_value);

    RETURN NEW;
  END
$$ LANGUAGE 'plpgsql';


  DROP TRIGGER IF EXISTS ports_conflicts_insert ON ports;
CREATE TRIGGER ports_conflicts_insert
    AFTER INSERT on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

  DROP TRIGGER IF EXISTS ports_conflicts_update ON ports;
CREATE TRIGGER ports_conflicts_update
    AFTER UPDATE on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

Line 3 : deletes existing entries for this port

Lines 6-10 : compiles a list of conflicts values

Lines 14-18 : does the same thing for conflicts_build values

Lines 22-26 : same again, for conflicts_install

The UNIONs tie them all together and line 5 puts it into the ports_conflicts table.

Details on the trigger

This section explains the trigger in a bit more detail.

The regexp_split_to_table function is a wonderful solution to break each CONFLICTS value into separate rows, like this:

freshports.dev=# select regexp_split_to_table('bacula5-server-* bacula-server-*',  E'\\s+');
 regexp_split_to_table 
-----------------------
 bacula5-server-*
 bacula-server-*
(2 rows)

freshports.dev=# 

The glob_to_regex(R.item_value) call changes each of the above items from a blog to a regex.

row_number() OVER (ORDER BY item_value) will sort individual value within a conflict and give us a value for the ports_conflicts.item_num field.

Matching against the regex

The list of conflicts for a given port will not change until the next commit for a port.

However, what that conflict matches against might change when another port is updated.

Thus, the ports_conflicts table only updates based upon a trigger on the ports table.

We will store the actual matches in the ports_conflicts_matches table.

CREATE TABLE ports_conflicts_matches
(
    ports_conflicts_id bigint NOT NULL,
    port_id integer NOT NULL,
    CONSTRAINT ports_conflicts_matches_pkey PRIMARY KEY (ports_conflicts_id, port_id),
    CONSTRAINT ports_conflicts_matches_conflicts_id FOREIGN KEY (ports_conflicts_id)
        REFERENCES public.ports_conflicts (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT ports_conflicts_matches_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

How do we populate that table? How can it be quickly done?

That is the part which took the longest time to figure out.

Simple population

Here are some sample CONFLICTS values from the early testing (as recorded in an older gist):

freshports.dev=# select * from ports_conflicts;
 id | port_id |  conflicts_type   | item_num |          item_value          
----+---------+-------------------+----------+------------------------------
 21 |   43574 | conflicts_install |        4 | mysqlwsrep.*
 22 |   43574 | conflicts_install |        3 | mysql[0-9].*\-server\-.*
 23 |   43574 | conflicts_install |        1 | mariadb10[013-9]\-server\-.*
 24 |   43574 | conflicts_install |        5 | percona[0-9].*\-server\-.*
 25 |   43574 | conflicts_install |        2 | mariadb5.*\-server\-.*
(5 rows)

freshports.dev=# 

Given the regex in ports_conflicts.item_value, let’s find the matching values from the commit_log_ports table. I can concatenate all the regex values into one search term. This approach works rather well:

freshports.dev=# explain analyse
freshports.dev-# SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
                                                                                     QUERY PLAN                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=19482.04..19482.68 rows=85 width=22) (actual time=497.597..497.856 rows=364 loops=1)
   ->  Sort  (cost=19482.04..19482.25 rows=85 width=22) (actual time=497.596..497.658 rows=852 loops=1)
         Sort Key: port_id, port_name_revision
         Sort Method: quicksort  Memory: 91kB
         ->  Gather  (cost=1000.00..19479.32 rows=85 width=22) (actual time=107.422..497.400 rows=852 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on commit_log_ports clp  (cost=0.00..18470.82 rows=35 width=22) (actual time=108.076..493.249 rows=284 loops=3)
                     Filter: (port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)'::text)
                     Rows Removed by Filter: 307469
 Planning time: 0.417 ms
 Execution time: 498.383 ms
(12 rows)

freshports.dev=# 

The results are something like this:

SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
 
  port_id |      port_name_revision       
---------+-------------------------------
   16965 | mysql51-server-5.1.11
   16965 | mysql51-server-5.1.12
   16965 | mysql51-server-5.1.14
   16965 | mysql51-server-5.1.15
   16965 | mysql51-server-5.1.16
   16965 | mysql51-server-5.1.17
   16965 | mysql51-server-5.1.18
   16965 | mysql51-server-5.1.19
   16965 | mysql51-server-5.1.20
   16965 | mysql51-server-5.1.21
   16965 | mysql51-server-5.1.22
...

Only a few rows of the results are shown above. The full list of matches is in that gist.

Yes, this works, however, that solution takes nearly half a second and caters for just set of one match. There must be a better way. If you keep reading in that gist, you will see the other methods I tried. I want to thank those on IRC who helped me find different ways to do this.

Next, I tried an SRF (Set Returning Function) and documented that in this gist. This one is fast; the test I ran taking 7.137 ms, but it did not scale, and exploded to 4825.074 ms when I tried multiple rows.

SRF within a function

My next attempt involved a plpgsql function which took only 13.180 ms, a huge speed increase. This turned out to be the solution I implemented.

Let’s review slightly using the final gist. My original function was:

CREATE OR REPLACE FUNCTION port_conflicts4(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  SELECT DISTINCT PC.id, CLP.port_id
    FROM commit_log_ports CLP JOIN ports_conflicts PC 
                              ON (PC.id = $1 AND CLP.port_name_revision ~ $2);
$$ LANGUAGE SQL STRICT;

Then I created a SRF using plpgsql, which is incredibly fast:

CREATE OR REPLACE FUNCTION port_conflicts(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  DECLARE
    a_ports_conflicts_id ALIAS FOR $1;
    a_regex              ALIAS FOR $2;

    l_regex text;
    r       ports_conflicts_matches%rowtype;
  BEGIN
    l_regex := '^(' || a_regex || ')';
    
    FOR r in SELECT distinct a_ports_conflicts_id, port_id FROM commit_log_ports WHERE port_name_revision ~ l_regex
    LOOP
      RETURN NEXT r;
    END LOOP;
  END;

$$ LANGUAGE 'plpgsql' STABLE;

Now let’s try that function out:

freshports.dev=# explain analyse SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts5(PC.id, '^' || PC.item_value || '$')) PC4;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.26..35052.76 rows=1751000 width=12) (actual time=0.594..2208.947 rows=10700 loops=1)
   ->  Seq Scan on ports_conflicts pc  (cost=0.00..32.51 rows=1751 width=26) (actual time=0.015..0.838 rows=1751 loops=1)
   ->  Function Scan on port_conflicts5  (cost=0.26..10.26 rows=1000 width=12) (actual time=1.258..1.259 rows=6 loops=1751)
 Planning time: 0.112 ms
 Execution time: 2209.598 ms
(5 rows)

freshports.dev=# 

2.2 seconds, to calculate conflicts for 1751 rows, resulting in 10700 rows. That’s fast enough. We could do that on a periodic basis if required.

From my personal development ethics, I don’t like the idea of having a function and a table with the same / very similar name (port_conflicts function vs ports_conflicts table). I might rename the function to find_matches_for_conflict. I think that better describes the purpose of the function.

Populate the destination table

freshports.dev=# insert into ports_conflicts_matches SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts(PC.id, '^' || PC.item_value || '$')) PC4;
INSERT 0 10700

Now let’s have a look at some of the results:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
freshports.dev-#   FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
freshports.dev-#  WHERE PC.port_id = 45086;
  packagename   |    ?column?    |     packagename     
----------------+----------------+---------------------
 bacula9-server | conflicts with | bacula5-server
 bacula9-server | conflicts with | bacula-server
 bacula9-server | conflicts with | bacula-server-devel
(3 rows)

freshports.dev=# 

You can find that magical 45086 value near the top of this post.

Let’s try openssl:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 954;
 packagename |    ?column?    |  packagename   
-------------+----------------+----------------
 openssl     | conflicts with | openssl111
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
(14 rows)

Or heimdal:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 934;
 packagename |    ?column?    | packagename  
-------------+----------------+--------------
 heimdal     | conflicts with | wu-ftpd
 heimdal     | conflicts with | wu-ftpd+ipv6
 heimdal     | conflicts with | srp
 heimdal     | conflicts with | krb5-maint
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-114
 heimdal     | conflicts with | krb5-111
 heimdal     | conflicts with | krb5-113
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-112
 heimdal     | conflicts with | krb5-116
 heimdal     | conflicts with | krb5-115
 heimdal     | conflicts with | krb4
(14 rows)

freshports.dev=# 

Now partially in production

The work is partially implemented in production. The idea came from this issue suggested by zi0r.

If you have a look at the Bacula port, you will find the following under Conflicts:

CONFLICTS:

  • bacula5-server-*
  • bacula-server-*

Future work

The next job is to automatically populate the ports_conflicts_matches table. I think that needs to be done on a periodic basis, perhaps every 10 minutes if there has been a recent commit.

Just thinking about that now:

  1. If port A conflicts with port B, then clearly port B conflicts with port A. I am not sure if there is a requirement to specify CONFLICTS in both ports, but that is something easily confirmed by the ports_conlicts table.
  2. I tried to think of commits which would not change ports_conflicts_matches. Since the matches can include PORTVERSION, any given commit which bumps that value can affect matches.
  3. Perhaps I could exclude commits to UPDATING, MOVED, or Mk, but they are relatively few compared to the rest.
  4. I think it’s just safer to recompute all matches periodically. There does not seem to be a safe and straight forward way to do it based upon a smaller unit of work.
Aug 122018
 

Today, FreshPorts broke. No commits were being processed. nullfs mounts were disappearing. No idea why.

Let’s follow the clues.

All times are local to me (Philadelphia).

12:46 pm

Rene contacted me on IRC, presumably about a failed sanity test I had seen earlier in the day:

To: rene@FreeBSD.org
From: FreshPorts Sanity Daemon <FreshPorts@FreshPorts.org>
Cc: dan@langille.org
Subject: FreshPorts sanity checking
Date: Fri, 10 Aug 2018 16:39:24 +0000
Message-ID: <20180810_163924_037160.FreshPorts@FreshPorts.org>

This message was generated by the FreshPorts Daemon.

You recently made this commit, which FreshPorts FreshPorts had trouble 
processing:

MessageID: 201808101638.w7AGcsUL078691@repo.freebsd.org
Subject  : svn commit: r476841 - head/www/feedjack

The following is a list of the ports which had errors:

http://www.freshports.org/www/feedjack/


The exact errors appear below.

www/feedjack:

This command (FreshPorts code 1):

/usr/local/bin/sudo /usr/sbin/chroot -u freshports
/var/db/freshports/ports-jail /make-port.sh /var/db/repos/PORTS-head
www/feedjack
2>/tmp/FreshPorts.www.feedjack.make-error.2018.8.10.16.39.24.59977

produced this error:

Error message is: chroot: /make-port.sh: No such file or directory



You are receiving this message as you are a FreeBSD
committer who has opted into this service.  Please
see https://www.FreshPorts.org/committer-opt-in.php
for more information.

-- 
FreshPorts Daemon

You can look at Rene’s commit.

Looking back, Rene’s was not the first commit to encounter this issue, but I didn’t notice the other failures. It wasn’t until I got a “FAILED: MASTER_PORT” notice that I knew something was wrong. MASTER_PORT is a routine test to exercise the code which determines the MASTER_PORT value. If it gets the wrong answer for sysutils/bacula-server, I know something is broken.

Now let’s examine the command mentioned in the sanity test failure email:

/usr/local/bin/sudo /usr/sbin/chroot -u freshports /var/db/freshports/ports-jail \
    /make-port.sh /var/db/repos/PORTS-head www/feedjack

A chroot is invoked on the /var/db/freshports/ports-jail directory as the freshports user. The command /make-port.sh /var/db/repos/PORTS-head www/feedjack is run.

If you want to read more about ports-jail and it’s script, try this search.

I tried running the command manually:

$ /usr/local/bin/sudo /usr/sbin/chroot -u freshports /var/db/freshports/ports-jail 
chroot: /bin/csh: No such file or directory

I thought: Why does it care about /bin/csh? I’m not running that, nor do I need it.

What’s in the jail?

$ ls -l /var/db/freshports/ports-jail
total 50
drwxr-xr-x  2 root        freshports     2 Oct 30  2017 bin
-r-xr-xr-x  1 root        wheel        617 Aug  3 04:31 cat-descr.sh
drwxr-xr-x  2 root        freshports     2 Oct 30  2017 dev
drwxr-xr-x  2 root        wheel          3 Aug  7 13:57 etc
drwxr-xr-x  2 root        freshports     2 Oct 30  2017 lib
drwxr-xr-x  2 root        freshports     2 Oct 30  2017 libexec
-r-xr-xr-x  1 root        wheel        593 Aug  3 04:31 make-category-comment.sh
-r-xr-xr-x  1 root        wheel        707 Aug  3 04:31 make-generate-plist.sh
-r-xr-xr-x  1 root        wheel        640 Aug  3 04:31 make-master-port-test.sh
-r-xr-xr-x  1 root        wheel        627 Aug  3 04:31 make-master-sites-all.sh
-r-xr-xr-x  1 root        wheel       1536 Aug  3 04:31 make-port.sh
-r-xr-xr-x  1 root        wheel        606 Aug  3 04:31 make-showconfig.sh
-r-xr-xr-x  1 root        wheel        601 Aug  3 04:31 realpath.sh
drwxr-xr-x  2 root        freshports     2 Oct 30  2017 sbin
drwxr-xr-x  7 root        wheel          7 Oct 30  2017 usr
drwxr-xr-x  3 freshports  freshports     3 Nov  1  2017 var
drwxr-xr-x  3 root        wheel          3 Oct  6  2017 var.original
-r-xr-xr-x  1 root        wheel        196 Aug  3 04:31 vars.sh
-r-xr-xr-x  1 root        wheel        196 Aug  3 04:31 vars.sh.sample

OK, make-port.sh is there. What about the bin directory?

$ ls -l /var/db/freshports/ports-jail/bin
total 0

Oh. That should not be empty. This is the problem.

iocage

The FreshPorts production server uses iocage for jail management. The extra nullfs mounts are lists below and are explained in this blog post.

[dan@x8dtu:~] $ sudo iocage fstab -l x8dtu-ingress01
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| INDEX |                                                                                  FSTAB ENTRY                                                                                  |
+=======+===============================================================================================================================================================================+
| 0     |                                                                                                                                                                               |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1     | /iocage/jails/x8dtu-ingress01/root/bin                          /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/bin           nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2     | none                                                            /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/dev           devfs   rw                0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3     | /iocage/jails/x8dtu-ingress01/root/lib                          /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/lib           nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 4     | /iocage/jails/x8dtu-ingress01/root/libexec                      /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/libexec       nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5     | /iocage/jails/x8dtu-ingress01/root/sbin                         /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/sbin          nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 6     | /iocage/jails/x8dtu-ingress01/root/usr/share/mk                 /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/share/mk  nullfs  ro,nosuid,noexec  0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 7     | /iocage/jails/x8dtu-ingress01/root/usr/sbin                     /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/sbin      nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 8     | /iocage/jails/x8dtu-ingress01/root/usr/bin                      /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/bin       nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 9     | /iocage/jails/x8dtu-ingress01/root/usr/lib                      /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/lib       nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 10    | /iocage/jails/x8dtu-ingress01/root/usr/libexec                  /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/libexec   nullfs  ro,nosuid         0   0 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[dan@x8dtu:~] $

The above is what should be mounted. Compare to below:

[dan@x8dtu:~] $ mount | grep x8dtu-ingress01
main_tank/iocage/jails/x8dtu-ingress01 on /iocage/jails/x8dtu-ingress01 (zfs, local, noatime, nfsv4acls)
main_tank/iocage/jails/x8dtu-ingress01/root on /iocage/jails/x8dtu-ingress01/root (zfs, local, noatime, nfsv4acls)
zroot/data/ingress01-testing on /iocage/jails/x8dtu-ingress01/root/usr/home/dan/tmp-fast (zfs, local, noatime, nfsv4acls)
main_tank/data/freshports/backend/cache on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/cache (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/backend/cache/html on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/cache/html (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/backend/cache/spooling on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/cache/spooling (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/backend/queues/archive on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/message-queues/archive (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/backend/queues/recent on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/message-queues/recent (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/backend/queues/retry on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/message-queues/retry (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail (zfs, local, noatime, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db/repos on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/data/freshports/repo/PORTS-head on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos/PORTS-head (zfs, local, noatime, nfsv4acls)
main_tank/data/freshports/backend/queues/incoming on /iocage/jails/x8dtu-ingress01/root/var/db/ingress/message-queues/incoming (zfs, local, noatime, noexec, nosuid, nfsv4acls)
devfs on /iocage/jails/x8dtu-ingress01/root/dev (devfs, local, multilabel)
fdescfs on /iocage/jails/x8dtu-ingress01/root/dev/fd (fdescfs)

Yeah, that’s hard to read.

How about this instead:

[dan@x8dtu:~] $ mount | grep /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail
main_tank/data/freshports/ports-jail on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail (zfs, local, noatime, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db/repos on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/data/freshports/repo/PORTS-head on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos/PORTS-head (zfs, local, noatime, nfsv4acls)
[dan@x8dtu:~] $

Yeah, stuff is missing. Let’s just restart the jail

[dan@x8dtu:~] $ sudo iocage restart x8dtu-ingress01
* Stopping x8dtu-ingress01
  + Running prestop OK
  + Stopping services OK
  + Removing jail process OK
  + Running poststop OK
* Starting x8dtu-ingress01
  + Started OK
  + Starting services OK

Now we see them all:

[dan@x8dtu:~] $ mount | grep /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail
main_tank/data/freshports/ports-jail on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail (zfs, local, noatime, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db/repos on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/data/freshports/repo/PORTS-head on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos/PORTS-head (zfs, local, noatime, nfsv4acls)
/iocage/jails/x8dtu-ingress01/root/bin on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/bin (nullfs, local, nosuid, read-only)
devfs on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/dev (devfs, local, multilabel)
/iocage/jails/x8dtu-ingress01/root/lib on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/lib (nullfs, local, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/libexec on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/libexec (nullfs, local, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/sbin on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/sbin (nullfs, local, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/usr/share/mk on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/share/mk (nullfs, local, noexec, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/usr/sbin on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/sbin (nullfs, local, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/usr/bin on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/bin (nullfs, local, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/usr/lib on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/lib (nullfs, local, nosuid, read-only)
/iocage/jails/x8dtu-ingress01/root/usr/libexec on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/libexec (nullfs, local, nosuid, read-only)
[dan@x8dtu:~] $

I went to check on the jail. It was empty

[dan@x8dtu:~] $ ls /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/bin
[dan@x8dtu:~] $

What? It’s mounted.

No, it’s not mounted:

[dan@x8dtu:/var/log] $ mount | grep /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail
main_tank/data/freshports/ports-jail on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail (zfs, local, noatime, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db (zfs, local, noatime, noexec, nosuid, nfsv4acls)
main_tank/data/freshports/ports-jail/var/db/repos on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/data/freshports/repo/PORTS-head on /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos/PORTS-head (zfs, local, noatime, nfsv4acls)
[dan@x8dtu:/var/log] $

In the short time between restarting the jail and checking the directory, the nullfs mounts were gone.

What is doing this? Nothing in logs, nothing to be found anywhere.

The story so far

  • The FreshPorts server consists of several jails, one of which uses a chroot, which I call a ports-jail, to differentiate it from a FreeBSD jail.
  • The ports-jail uses 10 nullfs mounts.
  • The nullfs mounts are present when the jail starts.
  • The nullfs mounts disappear sometime later.
  • What is removing these mounts? There was nothing in any logs.

This whole situation had me very confused.

2:20 pm

I amended /etc/fstab to contain all the mounts required by the x8dtu-ingress01 jail. This allowed me to easily test the issue by getting everything remounted easily via mount -a.

I kept monitoring the number of mountpoints via mount | wc -l. 82 was wrong. 92 was right.

I was looking at my script and trying to figure out what step was causing the issue. It didn’t happen every time. I was playing around with sleep 5 and other variations, trying to figure out if it was a timing issue.

I didn’t get anywhere with that.

2:45 pm

By this time, I was convinced something odd was going on with the server, so I rebooted it. Nothing was making sense.

2:59 pm

I open a ticket with the ISP because I can’t get to the FreshPorts server and my IPMI link is not working.

I was thinking I had message up the bootcode when I recently upgraded the zpool. I thought I had missed doing:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot …

3:31 pm

After a few short discussions, my ISP showed me this screen shot of the FreshPorts server.

/etc/fstab issues

/etc/fstab issues

OK, so that’s my fault. My edits at 2:20 pm to /etc/fstab have come back to haunt me.

3:41 pm

IPMI access has been restored. I remove my modification from /etc/fstab and reboot the server.

3:45 pm

The server is back online, but FreshPorts is still not healthy. The nullfs mounts are still disappearing and I have no idea why.

4:03 pm

A suggestion on IRC arrives:

[Aug 10 16:03] <@markj> dvl: if you have root access, you can try running 
"dtrace -n 'fbt::nullfs_unmount:entry {printf("%s (ppid %d)", curpsinfo->pr_psargs, curpsinfo->pr_ppid);}'"
to get some idea of where the unmount is coming from

I run the command. A short time later:

[dan@x8dtu:~] $ sudo dtrace -n 'fbt::nullfs_unmount:entry {printf("%s (ppid %d)", curpsinfo->pr_psargs, curpsinfo->pr_ppid);}'
dtrace: description 'fbt::nullfs_unmount:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 23475)
 13  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-nginx01/fstab (ppid 23492)
 13  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-nginx01/fstab (ppid 23492)

While markj and I are discussing this, more entries appear:

  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
  6  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-ingress01/fstab (ppid 24858)
 15  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-nginx01/fstab (ppid 24882)
 15  66394             nullfs_unmount:entry umount -a -F /iocage/jails/x8dtu-nginx01/fstab (ppid 24882)

I am really suspecting iocage now. The fstab files mentioned above are present on disk:

[dan@x8dtu:/var/log] $ ls -l  /iocage/jails/*/fstab
-rw-r--r--  1 root  wheel     0 Oct  3  2017 /iocage/jails/mx-ingress01/fstab
-rw-r--r--  1 root  wheel  1768 Aug 10 18:20 /iocage/jails/x8dtu-ingress01/fstab
-rw-r--r--  1 root  wheel   368 Oct 29  2017 /iocage/jails/x8dtu-nginx01/fstab
-rw-r--r--  1 root  wheel     0 Sep  9  2017 /iocage/jails/x8dtu-pg01/fstab
-rw-r--r--  1 root  wheel     0 Aug  7 21:03 /iocage/jails/x8dtu-pg02.vpn.unixathome.org/fstab

I can see they are used by both x8dtu-ingress01 and x8dtu-nginx01. The file contents is what you saw for iocage fstab -l earlier in this post.

NOTE: pid 23475 was no longer around, but knowing what that was would have helped us.

4:14pm

markj has another idea:

[Aug 10 16:11] <@markj> oh, dwatch can get the process tree. try "dwatch -R fbt::nullfs_mount:entry'" instead
[Aug 10 16:14] <@markj> dvl: the dwatch invocation should show more info about the ancestors of the umount process. hopefully 
enough to figure out what's going on

A short while later:

[dan@x8dtu:~] $ sudo dwatch -R fbt::nullfs_mount:entry
INFO Watching 'fbt::nullfs_mount:entry' ...
2018 Aug 10 20:16:48 0.0 mount_nullfs[26186]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/bin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/bin
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26186 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/bin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/bin
2018 Aug 10 20:16:48 0.0 mount_nullfs[26187]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/lib /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/lib
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26187 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/lib /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/lib
2018 Aug 10 20:16:48 0.0 mount_nullfs[26188]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/libexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/libexec
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26188 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/libexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/libexec
2018 Aug 10 20:16:48 0.0 mount_nullfs[26189]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/sbin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/sbin
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26189 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/sbin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/sbin
2018 Aug 10 20:16:48 0.0 mount_nullfs[26190]: mount_nullfs -o ro -o nosuid -o noexec /iocage/jails/x8dtu-ingress01/root/usr/share/mk /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/share/mk
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26190 0.0 mount_nullfs -o ro -o nosuid -o noexec /iocage/jails/x8dtu-ingress01/root/usr/share/mk /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/share/mk
2018 Aug 10 20:16:48 0.0 mount_nullfs[26191]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/sbin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/sbin
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26191 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/sbin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/sbin
2018 Aug 10 20:16:48 0.0 mount_nullfs[26192]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/bin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/bin
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26192 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/bin /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/bin
2018 Aug 10 20:16:48 0.0 mount_nullfs[26193]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/lib /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/lib
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26193 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/lib /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/lib
2018 Aug 10 20:16:48 0.0 mount_nullfs[26194]: mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/libexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/libexec
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26175 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-ingress01 file /bin/sh
    \-+= 26185 0.0 mount -a -F /iocage/jails/x8dtu-ingress01/fstab
      \-+= 26194 0.0 mount_nullfs -o ro -o nosuid /iocage/jails/x8dtu-ingress01/root/usr/libexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/usr/libexec
2018 Aug 10 20:16:49 0.0 mount_nullfs[26209]: mount_nullfs -o ro -o nosuid -o noexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/cache/html /iocage/jails/x8dtu-nginx01/root/var/db/freshports/cache/html
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26198 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-nginx01 file /bin/sh
    \-+= 26208 0.0 mount -a -F /iocage/jails/x8dtu-nginx01/fstab
      \-+= 26209 0.0 mount_nullfs -o ro -o nosuid -o noexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/cache/html /iocage/jails/x8dtu-nginx01/root/var/db/freshports/cache/html
2018 Aug 10 20:16:49 0.0 mount_nullfs[26210]: mount_nullfs -o ro -o nosuid -o noexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/signals /iocage/jails/x8dtu-nginx01/root/var/db/freshports/signals
 -+= 26131 0.0 /bin/sh /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh
  \-+= 26198 0.0 /usr/local/bin/python3.6 /usr/local/bin/iocage chroot x8dtu-nginx01 file /bin/sh
    \-+= 26208 0.0 mount -a -F /iocage/jails/x8dtu-nginx01/fstab
      \-+= 26210 0.0 mount_nullfs -o ro -o nosuid -o noexec /iocage/jails/x8dtu-ingress01/root/var/db/freshports/signals /iocage/jails/x8dtu-nginx01/root/var/db/freshports/signals

Ahah! It’s one of my Nagios check scripts, /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh

The script

This was the script in use. Do not use it. You’ll find out why later.

[dan@x8dtu:/usr/local/libexec/nagios-custom] $ cat check_py_iocage_host_vs_jails.sh
#!/bin/sh

HOSTVERSION=`/usr/bin/file /bin/sh`

JAILS=`/usr/local/bin/iocage list -H | /usr/bin/cut -f 2 -w`
ERRORS=''
for jail in ${JAILS}
do
  JAILVERSION=`/usr/local/bin/iocage chroot ${jail} 'file /bin/sh'`
  if [ "${JAILVERSION}" != "${HOSTVERSION}" ]
  then
    ERRORS="jail '${jail}' is ${JAILVERSION}"
  fi
done

if [ "${ERRORS}" == "" ]
then
  echo 'All jails match the host'
  exit 0
else
  echo "WARNING: HOST and jails are NOT in sync: host = ${HOSTVERSION} but ${ERRORS}"
  exit 2
fi
[dan@x8dtu:/usr/local/libexec/nagios-cuFalstom] $

This script is used to verify that the jail and the host are running the same version of world.

It was the iocage chroot command which was running umount.

I tried to track down why this was suddenly happening.

What changed?

I know Ansible was run earlier in the day. It was installing new scripts on each host for nrpe.

What happened? check_py_iocage_host_vs_jails.sh got updated.

My conclusion: local changes to the file on the x8dtu-ingress01 never pushed upstream.

Find the original

I checked ZFS snapshots. None.

I checked Bacula. I didn’t backup /usr/local/libexec/nagios-custom, presumably because it is in Ansible, but I do now.

Fail. No original script found.

4:30 pm Stop the bleeding

Now that I knew the cause, I commented out that entry in /usr/local/etc/nrpe.cfg and restarted nrpe3.

I restarted the two jails to fix their mount points.

The messed up commits, three or four of them, were rerun.

Success. The website is caught up and online.

Tweet sent.

Status page updated.

Original script found

I found the original script. Ansible backed it up, as instructed.

It was at /usr/local/libexec/nagios-custom/check_py_iocage_host_vs_jails.sh.39690.2018-08-10@15:59:42~

What was the difference?

# svn di
Index: check_py_iocage_host_vs_jails.sh
===================================================================
--- check_py_iocage_host_vs_jails.sh	(revision 1197)
+++ check_py_iocage_host_vs_jails.sh	(working copy)
@@ -2,11 +2,11 @@

 HOSTVERSION=`/usr/bin/file /bin/sh`

-JAILS=`/usr/local/bin/iocage list -H | /usr/bin/cut -f 2 -w`
+JAILS=`/usr/local/bin/iocage list | /usr/bin/egrep -v '^\| JID' | grep '^|' | /usr/bin/cut -f 4 -w`
 ERRORS=''
 for jail in ${JAILS}
 do
-  JAILVERSION=`/usr/local/bin/iocage chroot ${jail} 'file /bin/sh'`
+  JAILVERSION=`/usr/local/bin/iocage exec ${jail} 'file /bin/sh'`
   if [ "${JAILVERSION}" != "${HOSTVERSION}" ]
   then
     ERRORS="jail '${jail}' is ${JAILVERSION}"

I’ll keep the line 10 and ignore line 11. Line 10 is better.

Line 16 is the faulty one. exec is better than chroot.

What does the man page say?

     chroot        Chroot into a jail without actually starting the jail
                   itself.  Useful for initial setup like setting a root
                   password or configuring networking.  A command can be
                   specified as with the normal system, see chroot(8).
     exec          Execute a command inside the specified jail.  This is an
                   iocage UUID/NAME wrapper for jexec(8).  After invoking
                   exec, specify the jail, any commands to run inside that
                   jail, and any arguments for those commands.

To me, they should both work, but they do not. Clearly, chroot was mounting and umounting the fstab contents, by design.

I prefer chroot because it means the script can test non-running jails, but obviously I had to use exec instead.

Notes

  • I committed that ‘new’ script into the repo used by Ansible
  • Don’t work local, work on Ansible.
  • If you do work local, commit it back.
  • Backup your important files, even if they are supposed to be in Ansible
  • When things went wrong, I first suspected something Ansible did. I commented out the two new commands added to nrpe.cfg.
  • When I suspected Ansible, I should have looked around the /usr/local/libexec/nagios-custom directory and I’d have noticed the updated files.
  • I should learn more about dtrace and dwatch, especially on how to form a simple query.
  • Perhaps I should be monitoring mount points. At least then I’d have had more clues.

6:49 pm

My thanks to markj who provided the dtrace and dwatch queries which helped track down my mistake.

It’s pub time. Good night.