Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Jul 132019
 

Today I updated the test website with two changes:

  1. use of dd, dt, and dl tags in the details section of the ports page
  2. Three new graphs:
    1. doc
    2. ports
    3. src

The tags part was all the result of me reading up on them and concluding they could be useful.

The graphs were swills’ fault. They took about an hour to do, an most of that was figuring out the required changes.

I started with www/graphs2.php

Also involved is www/generate_content.php and www/graphs.js, but you can see the whole commit if you want.

Not included in the code are some SQL queries, which were saved in the issue.

Enjoy.

May 252019
 

I’m writing this post just to keep things straight in my head so I can decide how best to resolve this issue.

FreshPorts uses /var/db/freshports/cache/spooling on both the ingress jail and the nginx jail.

The nginx jail uses it for caching content. Page details are first spooled into /var/db/freshports/cache/spooling before moving it to /var/db/freshports/cache/ports.

The ingress jail uses this for refreshing various cached items.

This directory is configured by the FreshPorts-Scripts package, which is installed in both jails.

The problem: this directory is created chown freshports:freshports but it needs to be chown www:freshports in the jail.

My first question is: why does the nginx jail need the FreshPorts-Scripts package? It contains ingress related scripts. By that, I mean scripts related to incoming commits and the code to get them into the FreshPorts database.

How does it get into the jail?

[dan@x8dtu-nginx01:~] $ sudo pkg delete FreshPorts-Scripts
Checking integrity... done (0 conflicting)
Deinstallation has been requested for the following 3 packages (of 0 packages in the universe):

Installed packages to be REMOVED:
	FreshPorts-Scripts-1.1.15
	py27-freshports-fp-listen-1.0.10_3
	freshports-www-1.2.6

Number of packages to be removed: 3

The operation will free 4 MiB.

Proceed with deinstalling packages? [y/N]: n

Two other ports require it.

Ahh, yes, the fp-listen daemon needs the scripts:

[dan@x8dtu-nginx01:~] $ ps auwwx | grep fp-listen
root       35775  0.0  0.0   4244  1944  -  IJ   17:58   0:00.00 supervise fp-listen
freshports 35777  0.0  0.0  21076 16392  -  SJ   17:58   0:00.43 /usr/local/bin/python2.7 /usr/local/lib/python2.7/site-packages/fp-listen/fp-listen.pyc
dan        74034  0.0  0.0   6660  2532  2  S+J  18:57   0:00.00 grep fp-listen
[dan@x8dtu-nginx01:~] $ 

That’s going to be running on nginx regardless. That daemon listens to the PostgreSQL database for updates and clears the relevant portions of on-disk cache.

At first, I was trying to figure out what was installing the www user on the nginx jail. Then I realized, with help, that the www user is installed by default after having been added back in 2001.

It was originally added in 2001.

I see a solution:

  • chown www:freshports
  • chmod 775

That translates to this entry in the pkg-plist file:

@dir(www,freshports,775) %%FP_DATADIR%%/cache/spooling

That seems to fix the rename errors I was seeing:

2019/05/25 18:32:33 [error] 35875#100912: *4277 FastCGI sent in stderr: "PHP message: PHP Warning:  
rename(/tmp/ports.dns.odsclient.Detail.head.PageSize100.PageNum1.html.tmpmuB0Ah,/var/db/freshports/cache/ports/dns/odsclient/Detail.h
ead.PageSize100.PageNum1.html): Operation not permitted in /usr/local/www/freshports/classes/cache.php on line 83" while reading 
response header from upstream, client: 64.233.172.83, server: www.freshports.org, request: "GET /dns/odsclient HTTP/1.1", upstream: 
"fastcgi://unix:/var/run/php-fpm.sock:", host: "www.freshports.org"

Thanks for coming to my TED talk.

Jan 272019
 

Yesterday I copied data from the old production server to the new production server. One thing I missed, but did think about at the time, was updating the sequence used by the table in question. Looking at the table definition:

freshports.org=# \d report_log
                                          Table "public.report_log"
    Column    |           Type           | Collation | Nullable |                  Default                   
--------------+--------------------------+-----------+----------+--------------------------------------------
 id           | integer                  |           | not null | nextval('report_log_id_seq'::regclass)
 report_id    | integer                  |           | not null | 
 frequency_id | integer                  |           |          | 
 report_date  | timestamp with time zone |           | not null | ('now'::text)::timestamp(6) with time zone
 email_count  | integer                  |           | not null | 
 commit_count | integer                  |           | not null | 
 port_count   | integer                  |           | not null | 
Indexes:
    "report_log_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "$1" FOREIGN KEY (frequency_id) REFERENCES report_frequency(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (report_id) REFERENCES reports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

The report_log_id_seq value will be wrong. When the reports run, they will use values for id which are already present in the table. To confirm, I ran this test:

freshports.org=# BEGIN;
BEGIN
freshports.org=# INSERT INTO report_log (report_id, frequency_id, email_count, commit_count, port_count) VALUES (2, 4, 0, 0, 0);
ERROR:  duplicate key value violates unique constraint "report_log_pkey"
DETAIL:  Key (id)=(19074) already exists.
freshports.org=# ROLLBACK;
ROLLBACK
freshports.org=# SELECT max(id) FROM report_log;
  max  
-------
 20144
(1 row)

freshports.org=# 

Historically, I have done this with setval but today I will try ALTER SEQUENCE.

freshports.org=# BEGIN; ALTER SEQUENCE report_log_id_seq RESTART WITH 20145;
BEGIN
ALTER SEQUENCE
freshports.org=# INSERT INTO report_log (report_id, frequency_id, email_count, commit_count, port_count) VALUES (2, 4, 0, 0, 0);
INSERT 0 1
freshports.org=# ROLLBACK;
ROLLBACK

That worked, so I rolled it back and this time I’ll save the changes without inserting data;

freshports.org=# BEGIN; ALTER SEQUENCE report_log_id_seq RESTART WITH 20145;
BEGIN
ALTER SEQUENCE
freshports.org=# COMMIT;
COMMIT
freshports.org=# 

I remembered this issue while sorting out a configuration & code error this morning.

Jan 272019
 

After enabling the report notifications yesterday, they failed to go out. Why? A hardcoded hostname in a Perl module.

Here are the errors I found this morning.

from=’FreshPorts Watch Daemon <FreshPorts-Watch@FreshPorts.org>’ to=’dvl@example.org’ subject=’FreshPorts daily new ports’
could not open Email::Sender. from=’FreshPorts Watch Daemon <FreshPorts-Watch@FreshPorts.org>’ to=’dvl@example.org’ subject=’FreshPorts daily new ports’ errorcode=’unable to establish SMTP connection to cliff.int.example.net port 25
Trace begun at /usr/local/lib/perl5/site_perl/Email/Sender/Transport/SMTP.pm line 193
Email::Sender::Transport::SMTP::_throw(‘Email::Sender::Transport::SMTP=HASH(0x806f2ea68)’, ‘unable to establish SMTP connection to cliff.int.example.net port 25′) called at /usr/local/lib/perl5/site_perl/Email/Sender/Transport/SMTP.pm line 143
Email::Sender::Transport::SMTP::_smtp_client(‘Email::Sender::Transport::SMTP=HASH(0x806f2ea68)’) called at /usr/local/lib/perl5/site_perl/Email/Sender/Transport/SMTP.pm line 202

The interesting part, to me, was the host it was trying to contact: cliff.int.example.net

That is an internal host, here in my home network. Do I have my configuration wrong?

Let’s check:

$ sudo grep cliff -r /usr/local/etc/freshports/*
$ sudo grep cliff -r /usr/local/libexec/freshports/*
$ sudo grep -r cliff /usr/local/lib/perl5/site_perl/FreshPorts/*
/usr/local/lib/perl5/site_perl/FreshPorts/email.pm:		host => 'cliff.int.example.net', # $FreshPorts::Config::email_server,

Oh, there it is, in the email module, along with the commented out value it should be using.

I suspect I used that for testing at home, then checked it in without seeing what was there.

Fixing it

The host in question is a jail without any public IP addresses. Other jails communicate with this jail via a localhost address:

lo1: flags=8049 metric 0 mtu 16384
	options=600003
	inet 127.1.0.202 netmask 0xffffffff 
	groups: lo 

Note that this is lo1, not lo0. It is a clone of lo0. Note also the address in use. I like using addresses in the 127.0.0.0/8 block because it is assigned for use as the Internet host loopback address.

The configuration I had was:

$ sudo grep FreshPorts::Config::email_server *
config.pm:$FreshPorts::Config::email_server			= '127.1.0.202';

I modified the code in production (yes, testing in prod we are) to use the supplied configuration value:

$ cd /usr/local/lib/perl5/site_perl/FreshPorts
$ grep email_server email.pm 
		host => $FreshPorts::Config::email_server,

I tried the email testing code, specifically designed to test sending of email. I wonder why I had not done this before.

$ cd /usr/local/libexec/freshports]
$ echo ./test-sending-email.pl | sudo su -fm freshports
from='FreshPorts Watch Daemon ' to='dan@langille.org'
subject='FreshPorts test email - x8dtu-ingress01.int.unixathome.org'
could not open Email::Sender.  from='FreshPorts Watch Daemon ' 
to='dan@langille.org' subject='FreshPorts test email - x8dtu-ingress01.int.unixathome.org' errorcode='can't 
STARTTLS: 2.0.0 Ready to start TLS

What does the mail log say:

Jan 27 15:04:17 x8dtu-ingress01 postfix/smtpd[14533]: connect from unknown[127.1.0.202]
Jan 27 15:04:17 x8dtu-ingress01 postfix/smtpd[14533]: SSL_accept error from unknown[127.1.0.202]: 0
Jan 27 15:04:17 x8dtu-ingress01 postfix/smtpd[14533]: warning: TLS library problem: error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error:s3_pkt.c:1498:SSL alert number 80:
Jan 27 15:04:17 x8dtu-ingress01 postfix/smtpd[14533]: lost connection after STARTTLS from unknown[127.1.0.202]

Umm, what key is being used by Postfix?

$ postconf -n | grep key
smtp_tls_key_file = /usr/local/etc/ssl/x8dtu-ingress01.int.unixathome.org.key
smtpd_tls_key_file = /usr/local/etc/ssl/x8dtu-ingress01.int.unixathome.org.key

Ahh, I cannot specify the IP address, I must use the hostname, otherwise TLS will fail based on the certificate.

I changed the entry in the configuration file:

$ cd /usr/local/etc/freshports/
$ sudo grep email_server *
config.pm:$FreshPorts::Config::email_server			= 'x8dtu-ingress01.int.unixathome.org';

And added this entry to the hosts file:

$ grep x8dtu /etc/hosts
127.1.0.202	x8dtu-ingress01.int.unixathome.org

The entry is required because this hostname is not present in DNS.

Now the email goes out:

[dan@x8dtu-ingress01:/usr/local/libexec/freshports] $ echo ./test-sending-email.pl | sudo su -fm freshports
from='FreshPorts Watch Daemon ' to='dan@example.org' subject='FreshPorts test email - x8dtu-ingress01.int.unixathome.org'
finish 2019-01-27 15:12:17

I then went back to my development server and fed those code changes back into the repository. Testing in dev showed a problem with my Let’s Encrypt certificate which was not being refreshed on this host. It was being renewed, but not being installed.

Further tests in test and stage resulted in changes to $FreshPorts::Config::email_server on those hosts, because localhost and IP addresses where in use. They were changed to hostnames.

Eventually, the code was installed in production. It seems I spent more time getting things working in dev, test, and staging than it did to fix production.

Let’s see if the report notifications go out tonight.

For the record, I did check the report_log_latest tables and confirmed that the latest entries were still back in November. Thus, the reports to be compiled tonight will be cover the correct period.

Jan 262019
 

Earlier today I copied data from the old server (supernews) to the new server (x8dtu). Now that the database has the correct information regarding when reports were last sent out, we can begin to enable those reports.

The work has already been done to move the reports from cronjobs into periodic scripts. For our purposes, three new periodic categories have been added:

  • everythreeminutes
  • hourly
  • fortnightly

I also created the corresponding directories:

[dan@x8dtu-ingress01:/usr/local/etc/periodic] $ ls
daily             fortnightly       monthly           weekly
everythreeminutes hourly            security

Which contains the FreshPorts scripts, installed via pkg:

[dan@x8dtu-ingress01:/usr/local/etc/periodic] $ ls everythreeminutes hourly fortnightly
everythreeminutes:
100.fp_system_status

fortnightly:
310.send-report-notices-fortnightly   320.send-report-new-ports-fortnightly

hourly:
120.fp_test_master_port_make       180.fp_stats_hourly                260.fp_refresh_various_cache_items
140.fp_test_master_port_db         240.fp_missing_port_categories
[dan@x8dtu-ingress01:/usr/local/etc/periodic] $

Instructing periodic to run those scripts looks something like this:

$ grep periodic /etc/crontab 
*/3	*	*	*	*	root	periodic everythreeminutes
0	*	*	*	*	root	periodic hourly
1	3	*	*	*	root	periodic daily
15	4	*	*	6	root	periodic weekly
20	3	9,23	*	*	root	periodic fortnightly
30	5	1	*	*	root	periodic monthly

I knew that the script which does the reporting is report-notification.pl so I went looking to see what is using it:

[dan@x8dtu-ingress01:/usr/local/etc/periodic] $ grep -r report-notification.pl *
daily/310.send-report-notices-daily:	echo "cd $fp_scripts_dir && /usr/local/bin/perl report-notification.pl D" | su -fm $fp_freshports_user 2>&1 >> ${DIRLOG}/report-notification.daily || rc=3
fortnightly/310.send-report-notices-fortnightly:	echo "cd $fp_scripts_dir && /usr/local/bin/perl report-notification.pl F" | su -fm $fp_freshports_user 2>&1 >> ${DIRLOG}/report-notification.fortnightly || rc=3
monthly/310.send-report-notices-monthly:	echo "cd $fp_scripts_dir && /usr/local/bin/perl report-notification.pl M" | su -fm $fp_freshports_user 2>&1 >> ${DIRLOG}/report-notification.monthly || rc=3
weekly/310.send-report-notices-weekly:	echo "cd $fp_scripts_dir && /usr/local/bin/perl report-notification.pl W" | su -fm $fp_freshports_user 2>&1 >> ${DIRLOG}/report-notification.weekly || rc=3
[dan@x8dtu-ingress01:/usr/local/etc/periodic] $

Ahh, yes, it does look like I’ve already done this work for each reporting time period.

Next, let’s see what knobs we must enable.

[dan@x8dtu-ingress01:/usr/local/etc/periodic] $ grep enable `grep -rl report-notification.pl *`
daily/310.send-report-notices-daily:case "$fp_send_report_notices_daily_enable" in
fortnightly/310.send-report-notices-fortnightly:case "$fp_send_report_notices_fortnightly_enable" in
monthly/310.send-report-notices-monthly:case "$fp_send_report_notices_monthly_enable" in
weekly/310.send-report-notices-weekly:case "$fp_send_report_notices_weekly_enable" in

Those are the four flags I have to enable to get this working.

  1. fp_send_report_notices_daily_enable
  2. fp_send_report_notices_fortnightly_enable
  3. fp_send_report_notices_monthly_enable
  4. fp_send_report_notices_weekly_enable

Let’s check the configuration file:

$ grep report /etc/periodic.conf
# reports
#fp_send_report_notices_daily_enable="YES"
#fp_send_report_new_ports_daily_enable="YES"
#fp_send_report_notices_fortnightly_enable="YES"
#fp_send_report_new_ports_fortnightly_enable="YES"
#fp_send_report_notices_weekly_enable="YES"
#fp_send_report_new_ports_weekly_enable="YES"
#fp_send_report_notices_monthly_enable="YES"
#fp_send_report_new_ports_monthly_enable="YES"

With a quick sudoedit, I enabled all of those entries. I could have also used sysrc, but I figured sudoedit would be fine.

Now we wait.

Jan 262019
 

Ever since the new server went into production, sometime in 2017, the notification emails, which tell you what has changed, have been coming from the old server. I never did that changeover. The old server was still processing incoming commits, so the notifications were accurate. However, new accounts would not get notifications and changes to existing accounts would be ignored. The notifications were getting increasingly inaccurate for your settings.

This is far from ideal and not good at all.

What I have to do now is copy over the notification statistics.

When did this change occur

I was not sure when the new server went into production. I looked in old blog posts, and on social media, and failed to find out.

I resorted to DNS because every change is recorded in subversion, thanks to nsnotifyd (more information on how I did that via this post).

Finding that date took me a while, because I was searching for my name in the logs, I failed. Every commit for past 17 months has been done by the dns-notify user. Eventually I found it. I can confidently say the change over occurred at Sun Nov 5 16:53:23 2017 UTC.

This date will be used later in this post.

What needs to be moved over

The following is on the old server.

I wasn’t sure exactly what needed to be moved, but I knew the answer might be in the report-notification.pl scripts. Looking in there, I found this snippet:

                my $dbh = FreshPorts::Database::GetDBHandle();
                if ($dbh->{Active}) {

                        $sql = "select last_sent 
                                  from report_log_latest
                                 where frequency = '$Frequency'
                                   and report_id = $ReportID";
                        if ($FreshPorts::Config::ReportDebugging) {
                                print "sql is $sql\n";
                        }

                        $sth = $dbh->prepare($sql);
                        $sth->execute ||
                                        die "Could not execute SQL $sql ... maybe invalid";

                        @row=$sth->fetchrow_array;
                        my $last_sent = $row[0];

That’s it, I need entries from the report_log_latest table. What does that table look like?

freshports.org=# \d report_log_latest
                     View "public.report_log_latest"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 report_id    | integer                  |           |          | 
 frequency_id | integer                  |           |          | 
 frequency    | character(1)             |           |          | 
 last_sent    | timestamp with time zone |           |          | 

freshports.org=# 
 

Oh, it’s a view, not a table. Let’s look closer:

freshports.org=# \d+ report_log_latest
                                  View "public.report_log_latest"
    Column    |           Type           | Collation | Nullable | Default | Storage  | Description 
--------------+--------------------------+-----------+----------+---------+----------+-------------
 report_id    | integer                  |           |          |         | plain    | 
 frequency_id | integer                  |           |          |         | plain    | 
 frequency    | character(1)             |           |          |         | extended | 
 last_sent    | timestamp with time zone |           |          |         | plain    | 
View definition:
 SELECT report_log.report_id,
    report_log.frequency_id,
    report_frequency.frequency,
    max(report_log.report_date) AS last_sent
   FROM report_log,
    report_frequency
  WHERE report_log.frequency_id = report_frequency.id
  GROUP BY report_log.report_id, report_log.frequency_id, report_frequency.frequency;

freshports.org=# 

How long has it been since any reports were sent out:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         2 |            4 | F         | 2018-11-09 19:20:00.456444+00
         1 |            4 | F         | 2018-11-09 19:20:00.434312+00
         2 |            2 | D         | 2018-11-09 18:10:00.431505+00
         1 |            2 | D         | 2018-11-09 18:10:00.411285+00
         5 |            2 | D         | 2018-11-09 18:05:00.407087+00
         1 |            3 | W         | 2018-11-06 19:00:01.348784+00
         2 |            3 | W         | 2018-11-06 19:00:00.497571+00
         1 |            5 | M         | 2018-10-23 19:40:00.677857+00
         2 |            5 | M         | 2018-10-23 19:40:00.426695+00
(9 rows)

freshports.org=# 

OK, that’s about 11 weeks ago.

Let’s look at production

This is from the production server. Yes, the database name is the same. I think the database should be freshports.prod on this server. It helps that each of the databases is on a different server. At home, where they are all on one server, I have:

  1. freshports.dev
  2. freshports.stage
  3. freshports.test

On production, it has been 64 weeks since we took a copy of the old database and moved it to the new server:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         1 |            2 | D         | 2017-11-04 18:10:00.440969+00
         2 |            2 | D         | 2017-11-04 18:10:00.425991+00
         5 |            2 | D         | 2017-11-04 18:05:00.453121+00
         2 |            3 | W         | 2017-10-31 19:00:00.802006+00
         1 |            3 | W         | 2017-10-31 19:00:00.740916+00
         1 |            5 | M         | 2017-10-23 19:40:00.540706+00
         2 |            5 | M         | 2017-10-23 19:40:00.535091+00
         1 |            4 | F         | 2017-10-23 19:20:00.614792+00
         2 |            4 | F         | 2017-10-23 19:20:00.592397+00
(9 rows)

To retain history, I would like to copy over everything missing from the new server.

The date from above, when the DNS changed, was Sun Nov 5 16:53:23.

That corresponds nicely to the production data. This increases confidence in what I am about to do.

What’s missing from prod?

On the old server, what is newer than what’s on prod?

freshports.org=# SELECT COUNT(*) FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00';
 count 
-------
  1072
(1 row)

Let’s check the boundary condition there, and see if I can safely use > without missing any data:

freshports.org=# SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC LIMIT 10;
  id   | report_id | frequency_id |          report_date          | email_count | commit_count | port_count 
-------+-----------+--------------+-------------------------------+-------------+--------------+------------
 19073 |         1 |            2 | 2017-11-04 18:10:00.440969+00 |         123 |          396 |        435
 19074 |         5 |            2 | 2018-01-05 18:06:15.429902+00 |           1 |            2 |         88
 19076 |         1 |            2 | 2018-01-05 18:11:15.513286+00 |         459 |        18287 |      18829
 19075 |         2 |            2 | 2018-01-05 18:11:15.515059+00 |         118 |            0 |      44840
 19077 |         5 |            2 | 2018-01-06 18:06:15.501673+00 |           0 |            0 |          0
 19079 |         1 |            2 | 2018-01-06 18:11:15.68547+00  |         153 |          323 |        354
 19078 |         2 |            2 | 2018-01-06 18:11:15.699523+00 |         118 |            0 |        118
 19080 |         5 |            2 | 2018-01-07 18:05:00.545592+00 |           0 |            0 |          0
 19082 |         1 |            2 | 2018-01-07 18:10:00.820374+00 |         114 |          203 |        246
 19081 |         2 |            2 | 2018-01-07 18:10:00.830676+00 |         118 |            0 |        118
(10 rows)

freshports.org=# 

Yes, I can avoid duplicates easily by changing to > instead of >=.

I am confident I can just dump this query to a file, transfer it to the new server, and load it up without overwriting any data.

Let’s do that.

PostgreSQL Copy

Both the production and the old server are using PostgreSQL 11.1 but that should not matter much when using the COPY command. On the other hand, with pg_dump, you should always use pg_dump from the destination version.

Let’s do this!

freshports.org=# COPY (SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC) TO '/usr/home/dan/missing-reports.copy';
ERROR:  could not open file "/usr/home/dan/missing-reports.copy" for writing: Permission denied
HINT:  COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.
freshports.org=#

Ahh, PostgreSQL itself, the database server cannot write to my home directory. I have sudo privileges, so I could get around this easily:

$ touch /usr/home/dan/missing-reports.copy
$ sudo chown postgres /usr/home/dan/missing-reports.copy

After running the above command, we have:

$ head -2 missing-reports.copy 
19073	1	2	2017-11-04 18:10:00.440969+00	123	396	435
19074	5	2	2018-01-05 18:06:15.429902+00	1	2	88

Not everyone can do this, especially since I am on the PostgreSQL server with sudo access.

Instead, let’s try the \copy command.

freshports.org=# \copy (SELECT * FROM report_log WHERE report_date >= '2017-11-04 18:10:00.440969+00' ORDER BY report_date ASC) TO '/usr/home/dan/missing-reports2.copy';
COPY 1072

What did we get?

$ head -2 missing-reports2.copy
19073	1	2	2017-11-04 18:10:00.440969+00	123	396	435
19074	5	2	2018-01-05 18:06:15.429902+00	1	2	88
$ diff -ruN missing-reports.copy missing-reports2.copy 
$

Yes, that looks right and that diff confirms I have the same data as with the COPY command.

Loading up the \copy data

The transfer of the file from one server to another has been done.

Over on the production server, I have taken a backup.

freshports.org=# begin;
BEGIN
freshports.org=# \copy report_log FROM '~/missing-reports2.copy';
ERROR:  duplicate key value violates unique constraint "report_log_pkey"
DETAIL:  Key (id)=(19073) already exists.
CONTEXT:  COPY report_log, line 1
freshports.org=#

What?

Oh. Look back. I used >= instead of > as I planned to do.

The proper thing to do would be to run the query again, but I am confident that deleting that one line will be the same result. I took a copy of the local file and used vi to delete the first line, which contained the 19073 key.

Let’s try that again:

freshports.org=# begin;
BEGIN
freshports.org=# \copy report_log FROM '~/missing-reports2.copy';
COPY 1071
freshports.org=# 

That is the expected number of records, 1071.

Let’s check the latest sent dates:

freshports.org=# SELECT * FROM report_log_latest ORDER BY last_sent DESC;
 report_id | frequency_id | frequency |           last_sent           
-----------+--------------+-----------+-------------------------------
         2 |            4 | F         | 2018-11-09 19:20:00.456444+00
         1 |            4 | F         | 2018-11-09 19:20:00.434312+00
         2 |            2 | D         | 2018-11-09 18:10:00.431505+00
         1 |            2 | D         | 2018-11-09 18:10:00.411285+00
         5 |            2 | D         | 2018-11-09 18:05:00.407087+00
         1 |            3 | W         | 2018-11-06 19:00:01.348784+00
         2 |            3 | W         | 2018-11-06 19:00:00.497571+00
         1 |            5 | M         | 2018-10-23 19:40:00.677857+00
         2 |            5 | M         | 2018-10-23 19:40:00.426695+00
(9 rows)

Yes, that’s precisely what we had on the old server.

One final check:

freshports.org=# SELECT count(*) FROM report_log;
 count 
-------
 19863
(1 row)

Yes, that matches the old server (I have not shown that verification here).

Time to complete this:

freshports.org=# commit;
COMMIT
freshports.org=# 

Done.

Thank you for flying Air FreshPorts.

Jan 042019
 

When the website enters maintenance mode, there are a few steps to ensure the database is not updated while we dump the data, but still allow the website to display pages.

Disable ingress commit processing

In the ingress host (for example, x8dtu-ingress01), stop the FreshPorts daemon:

[dan@x8dtu-ingress01:~] $ sudo svc -d /var/service/freshports/
[dan@x8dtu-ingress01:~] $ 

Ensure the scripts have finished, you should not see any perl:

[dan@x8dtu-ingress01:~] $ ps auwwx | grep perl
dan     73052  0.0  0.0  6660  2524  3  S+J  13:07    0:00.00 grep perl
[dan@x8dtu-ingress01:~] $ 

Set website maintenance mode

On the website host (e.g. x8dtu-nginx01), in /usr/local/etc/freshports/constants.local.php, ensure this constant is set to TRUE:

define('IN_MAINTENCE_MODE', false);

Maintenance mode can be verified by visiting:

  1. https://www.freshports.org/customize.php – should redirect to https://www.freshports.org/now-in-maintenance-mode.php
  2. Visiting https://www.freshports.org/–/status/ where you should see “We are in in maintenance mode”.

Disable logins

On the website host (e.g. x8dtu-nginx01), in /usr/local/etc/freshports/constants.local.php, ensure this constant is defined, the actual value does not mattter.:

define('NO_LOGIN', true);

Confirm logins are disabled by visiting the status page and the login page (which should redirect to the home page).

Reverting maintenance mode

To come back out of maintenance mode, revert your changes, as above.

Jan 042019
 

x8dtu contains two jails for database servers: x8dtu-pg01 and x8dtu-pg02. At any given time, one is the production database server, and contains the IP address 127.1.54.32.

Let’s assume x8dtu-pg02 is on PostgreSQL 10.6 and we are upgrading to PostgreSQL 11.1 on x8dtu-pg01

The new packages would be installed on x8dtu-pg01. pg_dump would be run on x8dtu-pg01 (so we are using the newer code for dumping, standard PostgreSQL procedure) and would connect to the other server to get the data. For example:

[postgres@x8dtu-pg01 /var/db/DUMPING]$ pg_dumpall --globals -h 127.1.0.203 > globals.sql
[postgres@x8dtu-pg01 /var/db/DUMPING]$ psql template1 < /var/db/DUMPING/globals.sql
[postgres@x8dtu-pg01 /var/db/DUMPING]$ time pg_dump -j 14 -Z0 -Fd -h 127.1.0.203 freshports.org -f freshports.dump                                                                 

real    10m14.185s
user    1m21.128s
sys     0m24.430s

We create the new database and restore the database:

[postgres@x8dtu-pg01 /var/db/DUMPING]$ createdb freshports.org -E SQL_ASCII -T template0                                                                                           
[postgres@x8dtu-pg01 /var/db/DUMPING]$ time pg_restore -j 14 -d freshports.org /var/db/DUMPING/freshports.dump                                                                     

real    27m32.382s
user    0m10.793s
sys     0m44.011s
[postgres@x8dtu-pg01 /var/db/DUMPING]$ 

NOTE: the following procedure is untested. When it was written, 127.1.54.32 was not in use by any jail. This procedure will be updated when the next database upgrade occurs.

Add the IP address as an alias on the host:

[dan@x8dtu:~] $ sudo ifconfig lo1 alias 127.1.54.32/32

This is typical:

[dan@x8dtu:~] $ ifconfig lo1
lo1: flags=8049 metric 0 mtu 16384
	options=600003
	inet6 fd00::201 prefixlen 8 
	inet6 fe80::1%lo1 prefixlen 64 scopeid 0x5 
	inet 127.1.0.200 netmask 0xffffffff 
	inet 127.1.0.201 netmask 0xffffffff 
	inet 127.1.0.202 netmask 0xffffffff 
	inet 127.1.0.203 netmask 0xffffffff 
	inet 127.1.0.204 netmask 0xffffffff 
	inet 127.1.54.32 netmask 0xffffffff 
	nd6 options=21
	groups: lo 
[dan@x8dtu:~] $ 

Once the database is ready, we need to move the IP address.

Get the existing IP addresses. We use iocage for managing jails.

[dan@x8dtu:~] $ iocage get ip4_addr x8dtu-pg01
10.100.0.200,127.1.0.200

Add the production IP address:

[dan@x8dtu:~] $ sudo iocage set ip4_addr="10.100.0.200,127.1.0.200,127.1.54.32" x8dtu-pg01
Property: ip4_addr has been updated to 10.100.0.200,127.1.0.200,127.1.54.32

The jail needs to be restarted to grab the new IP address:

[dan@x8dtu:~] $ sudo iocage restart x8dtu-pg01

In ansible, update postgresql_address in these files.

  1. host_vars/x8dtu-pg02.vpn.unixathome.org
  2. host_vars/x8dtu-pg01.vpn.unixathome.org

Deploy those changes:

ansible-playbook jail-postgresql.yml --limit=x8dtu-pg01.vpn.unixathome.org,x8dtu-pg02.vpn.unixathome.org --tags=postgresql.conf,pg_hba

Ensure PostgreSQL is listening on the right IP addresses:

[dan@x8dtu-pg01:~] $ sockstat -p 5432 -4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
postgres postgres   19935 3  tcp4   127.1.0.200:5432      *:*
postgres postgres   19935 5  tcp4   127.1.54.32:5432      *:*

Now that the database server is listening on the new addresses, we can adjust the clients to use that newly-assigned IP address. See Swapping database servers.

Jan 042019
 

This procedure is rarely used, but is documented nonetheless.

A database server change is usually accomplished by moving the IP address from one server to another, not by adjusting the clients. The production database server is usually at 127.1.54.32.

When changing from one database server to another, there are a few configuration files to update:

Ingress host

These files are in /usr/local/etc/freshports:

  • config.pm$FreshPorts::Config::host : should contain the IP address of the PostgreSQL database server.
  • config.shHOST : The IP address of the PostgreSQL server.
  • ~freshports/.pgpass – Usually does not require updating as it contains a wildcard for the hostname.

This needs to be restarted:

svc -t /var/service/freshports

Website host

These files are in /usr/local/etc/freshports:

  • config.shHOST : The IP address of the PostgreSQL server.
  • database.php – The pg_connect call contains a host=IP.ADD.RE.SS clause.
  • forums.php – Contains a $PHORUM[‘DatabaseServer’] entry.
  • fp-listen.ini – Contains a HOST declaration.

These files are in /usr/local/etc/freshsource:

  • database.php – The pg_connect call contains a host=IP.ADD.RE.SS clause.
  • forums.php – Contains a $PHORUM[‘DatabaseServer’] entry.
  • fp-listen.ini – Contains a HOST declaration.

This needs to be restarted:

svc -t /var/service/fp-listen
Dec 272018
 

I originally documented this so I would have a reference while tracking down the problem. I have since solved the issue and you can jump to the bottom of this post to find the solution. What I do not know is why this was not an issue under PostgreSQL 10.6 but was under PostgreSQL 11.1.

How did the issue arise

FreshPorts deals with the FreeBSD package tree. One of the things it does is replicate the directory tree for the directories and files. Yesterday, I moved my development database from a PostgreSQL 10.6 server to a PostgreSQL 11.1 server. Now I am getting errors on commits which add a new file to the tree. I think I’ve narrowed it down to this trigger:

CREATE OR REPLACE FUNCTION element_pathname_insert() RETURNS TRIGGER AS $$
   BEGIN
      INSERT INTO element_pathname (element_id, pathname)
         VALUES (NEW.id, element_pathname(NEW.id));
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS element_pathname_insert ON element;
CREATE TRIGGER element_pathname_insert
    AFTER INSERT on element
    FOR EACH ROW
    EXECUTE PROCEDURE element_pathname_insert();

This trigger is invoked each time a new row is added to the element table.

This is where some background may be useful.

element

element is a self referential table, and implements a directory & file structure.

freshports.dev=# \d element
                                     Table "public.element"
       Column        |     Type     | Collation | Nullable |               Default               
---------------------+--------------+-----------+----------+-------------------------------------
 id                  | integer      |           | not null | nextval('element_id_seq'::regclass)
 name                | text         |           | not null | 
 parent_id           | integer      |           |          | 
 directory_file_flag | character(1) |           | not null | 
 status              | character(1) |           | not null | 
Indexes:
    "element_pkey" PRIMARY KEY, btree (id)
    "element_delete" btree (status) WHERE status = 'D'::bpchar
    "element_name" btree (name)
    "element_parent_id" btree (parent_id)
Check constraints:
    "element_directory_file_flag" CHECK (directory_file_flag = 'F'::bpchar OR directory_file_flag = 'D'::bpchar)
    "element_status" CHECK (status = 'A'::bpchar OR status = 'D'::bpchar)
Foreign-key constraints:
    "$1" FOREIGN KEY (parent_id) REFERENCES element(id) ON UPDATE RESTRICT ON DELETE CASCADE

For simplicity, the the Referenced by and Triggers definitions have been omitted.

Given a pathname, we can find the element id:

freshports.dev=# select Pathname_ID('/ports/head/sysutils/bacula-server/Makefile');
 pathname_id 
-------------
      205195
(1 row)

Given an id, we can find the pathname:

freshports.dev=# select element_pathname(205195);
              element_pathname               
---------------------------------------------
 /ports/head/sysutils/bacula-server/Makefile
(1 row)

For historical reasons, element_pathname() is implemented with a WHILE … LOOP structure. I am sure it could be updated to use the element_pathname table.

element_pathname

The element_pathname table is a relatively recent addition, whereas element_pathname() dates back about 17 years. The table was added to make some of the more complex queries easier to construct.

The problem

The problem arises when a commit adds a new file to the tree. The code first checks to see if the pathname already exists in the tree:

freshports.dev=# select Pathname_ID('/base/releng/12.0/release/tools/gce.conf');
 pathname_id 
-------------
            
(1 row)

freshports.dev=# 

It finds nothing, so it adds:

freshports.dev=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status)
                            values(element_id, element_parent_id, 'D', element_name, 'A')"
PL/pgSQL function element_add(text,character) line 89 at SQL statement

This is a conflict in the element_pathname table.

I suspect the reason is that element_pathname(NEW.id) is returning an empty string because it cannot yet see the newly inserted value in the element table.

I have not supplied the contents of Element_Add() because it is about 100 lines of code.

This ’empty string’ theory is supported by finding this:

freshports.dev=# select * from element_pathname where pathname = '';
 element_id | pathname 
------------+----------
     910588 | 
(1 row)

What file is that?

freshports.dev=# select element_pathname(910588);
               element_pathname                
-----------------------------------------------
 /base/head/bin/sh/tests/expansion/question2.0
(1 row)

Checking the logs, this new file was recently added, and I suspect, but have not confirmed, that it was the first addition to the tree after moving to PostgreSQL 11.1.

Sometimes it fails, but then it succeeds

I started doing some testing to see what happens and I created a test database for this very purpose. I tried a series of COMMIT INSERT ROLLBACK commands.

Let’s try that now:

[dan@pg03:~] $ psql FPtest
psql (11.1)
Type "help" for help.

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status) 
                            values(element_id, element_parent_id, element_directory_file_flag, element_name, 'A')"
PL/pgSQL function element_add(text,character) line 86 at SQL statement
FPtest=# rollback;
ROLLBACK

OK, let’s try it again, but with something I know will not exist.

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/DELETEME/1/2/3/4', 'F');
ERROR:  duplicate key value violates unique constraint "element_pathname_pathname"
DETAIL:  Key (pathname)=() already exists.
CONTEXT:  SQL statement "INSERT INTO element_pathname (element_id, pathname)
			VALUES (NEW.id, element_pathname(NEW.id))"
PL/pgSQL function element_pathname_insert() line 3 at SQL statement
SQL statement "insert into element(id, parent_id, directory_file_flag, name, status)
                            values(element_id, element_parent_id, 'D', element_name, 'A')"
PL/pgSQL function element_add(text,character) line 89 at SQL statement
FPtest=# rollback;
ROLLBACK

Well, there is no reason for that to fail, yet it does. Let’s try this:

FPtest=# begin;
BEGIN
FPtest=# select Element_Add('/DELETEME/1', 'F');
 element_add 
-------------
      910593
(1 row)

Woah, that worked. Let’s try the original, or close to it:

FPtest=# select Element_Add('/base/releng/12.0/release/tools/gce.conf', 'F');
 element_add 
-------------
      910595
(1 row)

FPtest=# rollback;
ROLLBACK
FPtest=# \q

That worked too. What’s up with that?

The IRC help

I got onto IRC and asked what might be up. I included this gist and it was RhodiumToad who pointed out that I must have two functions called element_pathname().

Have a look at line 5 of the stored element_pathname_insert() trigger in that gist (originally presented at the top of this post, but included here so you don’t have to scroll back):

CREATE OR REPLACE FUNCTION element_pathname_insert() RETURNS TRIGGER AS $$
   BEGIN
      RAISE WARNING 'element_pathname_insert says: %', NEW.id;
      INSERT INTO element_pathname (element_id, pathname)
         VALUES (NEW.id, element_pathname(NEW.id));
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

You might notice that this version of the function has a NOTICE added to it. I was trying many things to track this down.

Looking at the output of df I did find the second function:

freshports.org=# \df Element_Pathname
                              List of functions
 Schema |       Name       | Result data type | Argument data types |  Type  
--------+------------------+------------------+---------------------+--------
 public | element_pathname | text             | integer             | normal
 public | element_pathname | text             | integer, boolean    | normal
(2 rows)

df+ showed me the differences, reproduced here, side by side to make it easier to spot the differences. I have added whitespace to make similar code line up.

CREATE FUNCTION public.element_pathname(integer) RETURNS text
    LANGUAGE plpgsql IMMUTABLE
    AS $_$
   DECLARE
      element_id   ALIAS FOR $1;

      my_parent_id int4;
      element_name text;
      pathname     text;

begin
  pathname = '';
  select name, parent_id
    into element_name, my_parent_id 
    from element
   where id = element_id;

  IF FOUND THEN
    pathname := '/' || element_name || pathname;
    WHILE FOUND LOOP
      select name, parent_id 
        into element_name, my_parent_id
        from element
       where id = my_parent_id;

      IF FOUND THEN


        pathname = '/' || element_name || pathname;


      END IF;





    END LOOP;
  END IF;

  return pathname;
END;
$_$;
CREATE FUNCTION public.element_pathname(integer, boolean) RETURNS text
    LANGUAGE plpgsql STABLE
    AS $_$
   DECLARE
      element_id    ALIAS FOR $1;
      leading_slash ALIAS for $2;
      my_parent_id  int4;
      element_name  text;
      pathname      text;

begin
  pathname = '';
  select name, parent_id
    into element_name, my_parent_id
    from element
   where id = element_id;

  IF FOUND THEN
    pathname := '/' || element_name || pathname;
    WHILE FOUND LOOP
      select name, parent_id
        into element_name, my_parent_id
        from element
       where id = my_parent_id;

      IF FOUND THEN
         IF my_parent_id IS NULL THEN
           IF leading_slash THEN
              pathname = '/' || element_name || pathname;
           ELSE
              pathname = element_name || pathname;
           END IF;
           EXIT;
         ELSE
           pathname = '/' || element_name || pathname;
         END IF;
      END IF;
    END LOOP;
  END IF;

  return pathname;
END;
$_$;

These two functions are very similar in both purpose and structure, the primary difference: having a leading slash or not.

SIDE NOTE: whenever I code this I always consider whether or not to just remove the leading slash at the end of the code instead of testing a condition each time through the loop. The current code is plenty fast.

What stood out when comparing the two functions?

When I looked at the two functions, I noticed the one being invoked by the trigger was declared to be IMMUTABLE, which is defined as:

An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever.

This function does not modify the database, but it could return different results if the path was modified. The path does not get modified by these functions. Any modification would be external to what we are doing here.

Changed to STABLE

I changed the function to STABLE and tried my tests again.

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement.

This is definitely true for this function.

Defined as STABLE, this problem went away.

Why 11.1?

I do not know why this function is OK under PostgreSQL 10.6, but causes a problem under 11.1 but I can speculate. I am also sure I will be corrected.

I suspect 11.1 is better at handling immutable functions and started doing optimizations which 10 did not.

Or you could take the view that it got worse.

Yes, I know the function should not have been immutable at all, but I would like to know the explanation behind the change in behavior.

Further function changes

Those two functions were similar, but can be combined. I reduced the element_pathname (integer) function to merely invoke element_pathname (integer, boolean):

CREATE FUNCTION public.element_pathname(integer) RETURNS text
    LANGUAGE plpgsql STABLE
    AS $_$
   DECLARE
      element_id    ALIAS FOR $1;
BEGIN
  return Element_Pathname(element_id, TRUE);
END;
$_$;