Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

May 302020
 

I noticed this today:

root       58697  0.0  0.0 10680  2180  -  IJ   12May20  0:00.00 /usr/local/bin/readproctitle service errors: .../site_perl/FreshPorts/vuxml_parsing.pm line 232, <> chunk 1.\nWide character in print at /usr/local/lib/perl5/site_perl/FreshPorts/vuxml_parsing.pm line 232, <> chunk 1.\nWide character in print at /usr/local/lib/perl5/site_perl/FreshPorts/vuxml_parsing.pm line 232, <> chunk 1.\nUndefined subroutine &FreshPorts::CommitterOptIn::RecordErrorDetails called at ./process_vuxml.pl line 124, <> chunk 1.\n

I should monitor that process better.

But first, let’s find the vuxml entry which causes this. We know it’s after 12 May because that’s the date on the process.

Testing

To test the processing of a single vuxml entry, I delete it from the database.

freshports.dev=# begin;
BEGIN
freshports.dev=# delete from vuxml where vid = '0bfcae0b-947f-11ea-92ab-00163e433440';
DELETE 1
freshports.dev=# commit;
COMMIT
freshports.dev=# 

To initiate the processing of the vuxml file, I enter this command:

echo touch ~freshports/signals/vuxml ~freshports/signals/job_waiting | sudo su -fm freshports

This will invoke the vuml processing during the normal course of events. If a commit is being processed when the command is entered, the vuxml will not be processed until after that commit.

I could run the commands manually from the command line, but that would not properly exercise the code.

Found it!

I found 59fabdf2-9549-11ea-9448-08002728f74c produces the issue:

[dan@dev-ingress01:/var/service/freshports] $ ps auwwx | grep proc
root       92487  0.0  0.0 10680  2180  -  IJ   13:07    0:00.00 /usr/local/bin/readproctitle service errors: 
........Wide character in print at /usr/local/lib/perl5/site_perl/FreshPorts/vuxml_parsing.pm line 234, <> chunk 1.\n

Reproducing the issue is always a good first goal.

This is line 234:

        print "description:", $self->description(), "\n";

perldoc.perl.org says:

To avoid this warning and to avoid having different output encodings in a single stream, always specify an encoding explicitly, for example with a PerlIO layer:

    binmode STDOUT, ":encoding(UTF-8)";

That code goes into process_vuxml.pl which uses the vuxml_parsing.pmmodule.

Let’s try that. The steps are:

  1. freshports.dev=# begin; delete from vuxml where vid = '59fabdf2-9549-11ea-9448-08002728f74c'; commit;
    BEGIN
    DELETE 1
    COMMIT
    freshports.dev=# 
    
  2. $ echo touch ~freshports/signals/vuxml ~freshports/signals/job_waiting | sudo su -fm freshports
    
  3. $ ps auwwx | grep proc
    root       34401  0.0  0.0 10680  2180  -  IJ   13:25    0:00.00 /usr/local/bin/readproctitle service errors: 
    ........................................................................................................
    dan        35363  0.0  0.0 11324  2648  4  S+J  13:26    0:00.00 grep proc
    

No errors!

May 182020
 

On Sunday, the packages project was moved into production. This project was both fun and challenging.

The goal: within 1 hour of the repo being updated, FreshPorts has the packages imported and updated for your viewing pleasure.

Today, I went checking to see if a repo update had occurred. An issue was found and corrected. This post contains the details.

Why post this stuff?

My writing helps me remember and understand what went wrong and helps me avoid similar oversights in the future. It also helps document how the system works and will be useful when someone else takes over FreshPorts.

The repo update

Looking at the backend status page I saw that an update had occurred today at 1800 UTC:

FreeBSD:13:amd64 updated

FreeBSD:13:amd64 updated

Browsing over to the sysutils/bacula9-server page, I could see it was still listing the packages from the 2020-05-16 build:

sysutils/bacula9-server 202-05-16 build

sysutils/bacula9-server 202-05-16 build

Ideas

What follows are my investigations, writtten after the fact of where I looked and what I found.

Was it imported?

First, I checked the logs for events around 1800 UTC. I found plenty of evidence that the repo update had been detected and imported:

May 18 18:00:07 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: Starting up
May 18 18:00:07 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:11:aarch64/latest : 2018-Sep-29 04:24
May 18 18:00:07 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:11:aarch64/quarterly : 2020-May-05 00:02
May 18 18:00:07 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:11:amd64/latest : 2020-May-13 21:25
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:11:amd64/quarterly : 2020-May-12 05:56
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:11:i386/latest : 2020-May-13 18:34
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:11:i386/quarterly : 2020-May-12 05:56
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:12:aarch64/latest : 2018-Oct-11 05:47
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:12:aarch64/quarterly : 2020-May-09 17:08
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:12:amd64/latest : 2020-May-13 20:01
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:12:amd64/quarterly : 2020-May-12 05:30
May 18 18:00:08 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:12:i386/latest : 2020-May-13 17:58
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:12:i386/quarterly : 2020-May-12 05:41
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:13:aarch64/latest : 2020-May-16 06:24
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:13:aarch64/quarterly : 
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:13:amd64/latest : 2020-May-18 04:59
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:13:amd64/quarterly : 
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:13:i386/latest : 2020-May-18 05:40
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: checking: FreeBSD:13:i386/quarterly : 
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: There are 2 new repos ready for import: ['FreeBSD:13:amd64/latest : 2020-May-18 04:59', 'FreeBSD:13:i386/latest : 2020-May-18 05:40']
May 18 18:00:09 x8dtu-ingress01 check_repos_for_new_stuff.py[46975]: finishes
May 18 18:00:10 x8dtu-ingress01 fp-daemon[47094]: yes, there is a job waiting
May 18 18:00:10 x8dtu-ingress01 FreshPorts[47095]: flag '/var/db/freshports/signals/vuxml' not set.  no work for process_vuxml.sh (/usr/local/libexec/freshports) 
May 18 18:00:10 x8dtu-ingress01 FreshPorts[47095]: /var/db/freshports/signals/new_repo_ready_for_import exists.  About to run import_packagesite.py (/usr/local/libexec/freshports) 
May 18 18:00:10 x8dtu-ingress01 import_packagesite.py[47105]: Starting up
May 18 18:00:10 x8dtu-ingress01 import_packagesite.py[47105]: we have 2 repos to process
May 18 18:00:10 x8dtu-ingress01 import_packagesite.py[47105]: importing FreeBSD:13:amd64/latest
May 18 18:00:10 x8dtu-ingress01 import_packagesite.py[47105]: command is: /usr/local/libexec/freshports/fetch-extract-parse-import-one-abi.sh FreeBSD:13:amd64 latest
May 18 18:00:10 x8dtu-ingress01 freshports[47108]: got into /usr/local/libexec/freshports/fetch-extract-parse-import-one-abi.sh
May 18 18:00:10 x8dtu-ingress01 dan[47109]: starting /usr/local/libexec/freshports/fetch-extract-parse-import-one-abi.sh
May 18 18:00:25 x8dtu-ingress01 import-via-copy-packagesite-all-raw-fields.py[47420]: Starting up
May 18 18:00:25 x8dtu-ingress01 syslogd: last message repeated 1 times
May 18 18:00:25 x8dtu-ingress01 import-via-copy-packagesite-all-raw-fields.py[47420]: copying in from packagesite.tsv
May 18 18:00:27 x8dtu-ingress01 import-via-copy-packagesite-all-raw-fields.py[47420]: Finishing
May 18 18:00:27 x8dtu-ingress01 dan[47428]: finished importing FreeBSD:13:amd64/latest
May 18 18:00:27 x8dtu-ingress01 import_packagesite.py[47105]: calling PackagesLastCheckedSetImportDate()
May 18 18:00:27 x8dtu-ingress01 import_packagesite.py[47105]: importing FreeBSD:13:i386/latest
May 18 18:00:27 x8dtu-ingress01 import_packagesite.py[47105]: command is: /usr/local/libexec/freshports/fetch-extract-parse-import-one-abi.sh FreeBSD:13:i386 latest
May 18 18:00:27 x8dtu-ingress01 freshports[47430]: got into /usr/local/libexec/freshports/fetch-extract-parse-import-one-abi.sh
May 18 18:00:27 x8dtu-ingress01 dan[47431]: starting /usr/local/libexec/freshports/fetch-extract-parse-import-one-abi.sh
May 18 18:00:30 x8dtu-ingress01 import-via-copy-packagesite-all-raw-fields.py[47465]: Starting up
May 18 18:00:30 x8dtu-ingress01 syslogd: last message repeated 1 times
May 18 18:00:30 x8dtu-ingress01 import-via-copy-packagesite-all-raw-fields.py[47465]: copying in from packagesite.tsv
May 18 18:00:31 x8dtu-ingress01 import-via-copy-packagesite-all-raw-fields.py[47465]: Finishing
May 18 18:00:31 x8dtu-ingress01 dan[47480]: finished importing FreeBSD:13:i386/latest
May 18 18:00:31 x8dtu-ingress01 import_packagesite.py[47105]: calling PackagesLastCheckedSetImportDate()
May 18 18:00:31 x8dtu-ingress01 import_packagesite.py[47105]: There are 2 repos which need post-import processing: ['FreeBSD:13:amd64:latest', 'FreeBSD:13:i386:latest']
May 18 18:00:31 x8dtu-ingress01 import_packagesite.py[47105]: Finishing
May 18 18:00:31 x8dtu-ingress01 FreshPorts[47095]: Finished running import_packagesite.py (/usr/local/libexec/freshports) 
May 18 18:00:31 x8dtu-ingress01 FreshPorts[47095]: flag '/var/db/freshports/signals/refresh_moved' not set.  no work for process_moved.sh (/usr/local/libexec/freshports) 
May 18 18:00:31 x8dtu-ingress01 FreshPorts[47095]: flag '/var/db/freshports/signals/refresh_updating' not set.  no work for process_updating.sh (/usr/local/libexec/freshports) 
May 18 18:00:31 x8dtu-ingress01 FreshPorts[47095]: /var/db/freshports/signals/new_repo_imported exists.  About to run UpdatePackagesFromRawPackages.py (/usr/local/libexec/freshports) 
May 18 18:00:31 x8dtu-ingress01 UpdatePackagesFromRawPackages.py[47481]: Starting up
May 18 18:00:31 x8dtu-ingress01 UpdatePackagesFromRawPackages.py[47481]: updating packages table for FreeBSD:13:amd64/latest
May 18 18:00:37 x8dtu-ingress01 UpdatePackagesFromRawPackages.py[47481]: updating packages table for FreeBSD:13:i386/latest
May 18 18:02:16 x8dtu-ingress01 UpdatePackagesFromRawPackages.py[47481]: There were 2 repos updated in packages.
May 18 18:02:16 x8dtu-ingress01 UpdatePackagesFromRawPackages.py[47481]: Finishing
May 18 18:02:16 x8dtu-ingress01 FreshPorts[47095]: Finished running UpdatePackagesFromRawPackages.py (/usr/local/libexec/freshports) 

Conclusion: data import is working properly.

Was the cache cleared

For cache clearing to occur, the above process issues a PostgreSQL NOTIFY.

My next idea was fp-listen on the x8dtu-nginx01 host was not listening for the new events for clearing the package cache. The logs indicated it was not:

I went looking for the list output upon startup:

May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: Starting up - this should not occur often
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: These are the (event name, script name) pairs we are ready for:
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: ('port_updated', 'listen_port')
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: ('ports_moved', 'listen_ports_moved')
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: ('ports_updating', 'listen_ports_updating')
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: ('vuxml', 'listen_vuxml')
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: ('category_new', 'listen_category_new')
May 17 01:38:07 x8dtu-nginx01 fp-listen[42345]: ('date_updated', 'listen_date_updated')

Comparing it to dev, I found packages_imported was missing.

May 12 20:36:53 dev-nginx01 fp-listen[59516]: Starting up - this should not occur often
May 12 20:36:53 dev-nginx01 fp-listen[59516]: These are the (event name, script name) pairs we are ready for:
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('port_updated', 'listen_port')
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('ports_moved', 'listen_ports_moved')
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('ports_updating', 'listen_ports_updating')
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('vuxml', 'listen_vuxml')
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('category_new', 'listen_category_new')
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('date_updated', 'listen_date_updated')
May 12 20:36:53 dev-nginx01 fp-listen[59516]: ('packages_imported', 'listen_packages_imported')

Checking the webserver

Checking the webserver, I was about to restart the fp-listen process when I thought to check this output:

$ ps auwwx | grep readproc
root       49860  0.0  0.0  10680  2180  -  IJ   Sun14   0:00.00 /usr/local/bin/readproctitle service errors: ................................................................................................................
..................................................................................cannot open 
'main_tank/data/freshports/backend/cache/packages': missing '@' delimiter in snapshot name\ncannot open 
'main_tank/data/freshports/backend/cache/packages': missing '@' delimiter in snapshot name\n

HAH! So the NOTIFY was received.

The issue was configuration as shown by my change here:

[dan@x8dtu-nginx01:/usr/local/etc/freshports] $ sudo diff -ruN fp-listen.ini~ fp-listen.ini
--- fp-listen.ini~	2020-05-17 13:25:50.000000000 +0000
+++ fp-listen.ini	2020-05-18 19:35:58.964021000 +0000
@@ -18,7 +18,7 @@
 NEWS_CACHE_DIR  = %(BASEDIR)s/cache/news/
 
 # this is the name of the zfs filesystem to rollback, including the snapshot name
-PKG_ZFS_SNAPSHOT    = main_tank/data/freshports/backend/cache/packages
+PKG_ZFS_SNAPSHOT    = main_tank/data/freshports/backend/cache/packages@empty
 
 [flags]
 # used for processing category updates
[dan@x8dtu-nginx01:/usr/local/etc/freshports] $ 

The snapshot name was missing.

Clearing the cache

I cleared the cache by entering this command:

[dan@x8dtu-nginx01:~] $ echo zfs rollback main_tank/data/freshports/backend/cache/packages@empty | sudo su -fm freshports

Checking the webpage

Now the sysutils/bacula9-server page shows:

sysutils/bacula9-server 18:00

sysutils/bacula9-server 18:00

Now that’s the refreshed data.

What about the missing LISTEN for information?

Remember when I said the host was not listening for the new events? Let’s check farther into the logs:

May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: Starting up - this should not occur often
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: These are the (event name, script name) pairs we are ready for:
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('port_updated', 'listen_port')
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('ports_moved', 'listen_ports_moved')
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('ports_updating', 'listen_ports_updating')
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('vuxml', 'listen_vuxml')
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('category_new', 'listen_category_new')
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('date_updated', 'listen_date_updated')
May 17 12:59:32 x8dtu-nginx01 fp-listen[14300]: ('packages_imported', 'listen_packages_imported')

Yeah, there it is. It was listening. I was looking too early in the logs

A lingering issue

While inspecting the logs, I noticed that fp-listen was restarting often, sometimes hundreds of times a day.

[dan@x8dtu-nginx01:~] $ sudo grep -c 'Starting up - this should not occur often' /var/log/freshports/freshports.log*
/var/log/freshports/freshports.log:0
/var/log/freshports/freshports.log.0:85
/var/log/freshports/freshports.log.1:255
/var/log/freshports/freshports.log.10:0
/var/log/freshports/freshports.log.11:0
/var/log/freshports/freshports.log.12:0
/var/log/freshports/freshports.log.13:0
/var/log/freshports/freshports.log.2:159
/var/log/freshports/freshports.log.3:119
/var/log/freshports/freshports.log.4:236
/var/log/freshports/freshports.log.5:129
/var/log/freshports/freshports.log.6:0
/var/log/freshports/freshports.log.7:0
/var/log/freshports/freshports.log.8:0
/var/log/freshports/freshports.log.9:0

This is an issue I noticed earlier and fixed. I also think it was related to a recent issue.

The problem occurs when trying to clear the cache for for a port. This appears in the logs as:

May 17 12:59:32 x8dtu-nginx01 fp-listen[2856]: removing glob /var/db/freshports/cache/ports/devel/py-setuptools/*

There have been no restarted since May 17 17:01:28. Checking the logs, I see this was intentional, because I restarted the fp-listen process at that time.

May 17 17:01:28 x8dtu-nginx01 sudo[38174]:      dan : TTY=pts/2 ; PWD=/var/db/freshports/cache/packages ; USER=root ; COMMAND=/usr/local/bin/svc -t /var/service/fp-listen

I’m guessing I did that right after adding the required new entry to the database, something like this:

-- insert this so fp-listen knows to listen
freshports.org=# insert into listen_for ( name, script_name ) values ('listen_for', 'ClearPackagesCache');

The original is found in my git repo

Welcome to packages

I’m glad that work is now in production, not just because it was complex, but also because of how seamlessly the data is imported and presented to the user.

If you notice issues, please let me know. If you are a maintainer and you know the repo has a lagged version, please check to see if FreshPorts detects them.

Thank you.

Edit: 2020-05-22

fp-listen was not restarted after the update to fp-listen.ini:

[dan@x8dtu-nginx01:/usr/local/etc/freshports] $ ps auwwx | grep fp-listen
freshports 38176  0.0  0.0  29348  19668  -  SJ   Sun17    0:58.23 /usr/local/bin/python3.7 /usr/local/lib/python3.7/site-packages/fp-listen/fp-listen.pyc
root       49862  0.0  0.0  10696   2192  -  IJ   Sun14    0:00.02 supervise fp-listen
dan        66966  0.0  0.0  11176   2676  0  R+J  20:40    0:00.00 grep fp-listen
[dan@x8dtu-nginx01:/usr/local/etc/freshports] $ ls -l fp-listen.ini
-rw-r-----  1 root  freshports  813 May 18 19:35 fp-listen.ini
[dan@x8dtu-nginx01:/usr/local/etc/freshports] $    

The date on the file is after the date on the process. Rookie error.

fp-listen has been restarted.

Edit: 2020-05-22

That’s what we want to see!

May 24 01:00:28 x8dtu-nginx01 fp-listen[67261]: Just woke up! *************
May 24 01:00:28 x8dtu-nginx01 fp-listen[67261]: Got NOTIFY: pid='71564', channel='packages_imported', payload=''
May 24 01:00:28 x8dtu-nginx01 fp-listen[67261]: found key packages_imported
May 24 01:00:28 x8dtu-nginx01 fp-listen[67261]: invoking PackagesCacheClear()
May 24 01:00:28 x8dtu-nginx01 fp-listen[67261]: Time to rollback main_tank/data/freshports/backend/cache/packages@empty
May 24 01:00:30 x8dtu-nginx01 fp-listen[67261]: zfs rollback succeeded.
May 24 01:00:30 x8dtu-nginx01 fp-listen[67261]: Done with PackagesCacheClear()

Done. Closing this issue.

Apr 292020
 

When a new repo is imported, the packages cache needs to be deleted. The cache is on the front-end, the web server. Cache clearing on the web server is handled by the fp-listen daemon.

It does this by using the LISTEN command.

The list of channels it listens to is controlled by the listen_for table:

freshports.test=# \d listen_for
                               Table "public.listen_for"
   Column    |  Type   | Collation | Nullable |                Default                 
-------------+---------+-----------+----------+----------------------------------------
 id          | integer |           | not null | nextval('listen_for_id_seq'::regclass)
 name        | text    |           | not null | 
 script_name | text    |           | not null | 
Indexes:
    "listen_for_pkey" PRIMARY KEY, btree (id)
    "listen_for_name_idx" UNIQUE, btree (name)

freshports.test=# 

The values now in place are:

freshports.test=# SELECT name, script_name FROM listen_for ORDER BY name;
      name      |      script_name      
----------------+-----------------------
 category_new   | listen_category_new
 date_updated   | listen_date_updated
 port_updated   | listen_port
 ports_moved    | listen_ports_moved
 ports_updating | listen_ports_updating
 vuxml          | listen_vuxml
(6 rows)

freshports.test=# 

I’m planning to add a new one: packages_cache_clear

freshports.dev=# INSERT INTO listen_for (name, script_name) values ('packages_cache_clear', 'ClearPackagesCache');
INSERT 0 1

EDIT 2020-05-01 : This was implemented using zfs rollback for cache clearing.

Apr 282020
 

Today, while I wasn’t looking, both dev and test located and imported new package information for:

  • FreeBSD:11:amd64
  • FreeBSD:11:i386
  • FreeBSD:12:amd64
  • FreeBSD:12:i386
  • FreeBSD:13:amd64
  • FreeBSD:13:i386

However, the packages table was not correctly updated.

Why? Because I lost some function code I wrote.

This is the code which was in use, as mentioned in Using DEFERRABLE INITIALLY DEFERRED on constraints:

    UPDATE packages_raw PR
       SET abi_id  = abi.id,
           port_id = P.id
      FROM abi, ports P, element_pathname EP
     WHERE PR.abi         = 'FreeBSD:13:aarch64'
       AND PR.package_set = 'latest'
       AND PR.abi         = abi.name
       AND EP.pathname    = '/ports/head/' || PR.package_origin
       AND P.element_id   = EP.element_id;

As you can see, this will only update the FreeBSD:13:aarch64 ABI, for both latest and quarterly repos.

I needed the parameter driven version. Sadly, the above was in the repo, as expected. Looking back through the change log, it had never been saved.

I started thinking about backups I had taken of the database. As I was getting ready to restore old versions, I thought about where I might have saved this away during the work-in-progress phase.

I found this in an old explain:

 UPDATE packages_raw PR
       SET abi_id  = abi.id,
           port_id = PO.port_id
      FROM abi, ports_origin PO, ports P, element_pathname EP
     WHERE PR.abi         = 'FreeBSD:13:aarch64'
       AND PR.package_set = 'latest'
       AND PR.abi         = abi.name
       AND PO.port_origin = PR.package_origin
       AND PO.port_id     = P.id
       AND P.element_id   = EP.element_id
       AND EP.pathname like '/ports/head/%';

I remembered removing the ports_origin table from the query. This is what I massaged the above into:

    -- set abi_id and port_id before updating the pacakges table.
    UPDATE packages_raw PR
       SET abi_id  = abi.id,
           port_id = P.id
      FROM abi, ports P, element_pathname EP
     WHERE abi.id         = l_abi_id
       AND abi.name       = PR.abi
       AND PR.package_set = a_package_set
       AND EP.pathname    = '/ports/head/' || PR.package_origin
       AND P.element_id   = EP.element_id;

While hunting around old auto-saved TextEdit files, I found something which was identical to the above.

Here’s the final diff:

[dan@pg02:~/src/freshports/database-schema] $ svn di
Index: sp.txt
===================================================================
--- sp.txt	(revision 5312)
+++ sp.txt	(working copy)
@@ -3715,9 +3715,9 @@
        SET abi_id  = abi.id,
            port_id = P.id
       FROM abi, ports P, element_pathname EP
-     WHERE PR.abi         = 'FreeBSD:13:aarch64'
-       AND PR.package_set = 'latest'
-       AND PR.abi         = abi.name
+     WHERE abi.id         = l_abi_id
+       AND abi.name       = PR.abi
+       AND PR.package_set = a_package_set
        AND EP.pathname    = '/ports/head/' || PR.package_origin
        AND P.element_id   = EP.element_id;
 
[dan@pg02:~/src/freshports/database-schema] $ 

But wait, there’s more

While proof-reading this, I realized that I can remove abi from the above query.

We already have l_abi_id (line 13) and abi.name (line 12). Let’s try this instead:

    UPDATE packages_raw PR
       SET abi_id  = l_abi_id,
           port_id = P.id
      FROM ports P, element_pathname EP
     WHERE PR.abi          = a_abi_name
       AND PR.package_set = a_package_set
       AND EP.pathname    = '/ports/head/' || PR.package_origin
       AND P.element_id   = EP.element_id;

The explain analyse for this query is still good.

What’s next?

The packages cache needs to be automatically cleared after importing. At present, that’s a manual command:

[dan@test-nginx01:~] $ sudo rm -rf ~freshports/cache/packages/*
[dan@test-nginx01:~] $ 

The UpdatePackagesFromRawPackages.py script should raise a new signal.

No, it can’t be a signal. Signals are only within a given host. I has to be through the database because that’s the only real communication between the ingress host and the website.

Todays’s lessons

When you’ve got something good, make sure it’s saved to the repo.

Apr 182020
 

A new FreshPorts features is being developed. I’m trying to determine if packages for a port can be cached separately from the port itself. I will explain.

When you visit a port page, if it’s not cached, the page is built and then cached. Not the whole page, just the parts pertaining to that port. The right hand column is not cached.

If you look at https://dev.freshports.org/editors/joe you’ll see a table listing the available packages, as shown here:

Packages for joe

Packages for joe

At present, this table is cached with the rest of the port information.

My thinking: ports and packages are updated independently. It would be nice/cool/pretty damn eloquent to cache them separately.

PROBLEM: The ports cache is static HTML. How do you toss another bit of static HTML into the middle of that?

How caching works

When an update to a port occurs, the cache for that port is cleared. The next web request for that port will repopulate that cache.

While generating the HTML for that port, if the package HTML is not found, it can also be generated.

However, packages and port are updated via separate mechanisms:

  • port – via an incoming commit
  • packages – a new package repo build appears

PROBLEM: the packages cache can disappear right after you look to see if it’s there; it’s a race condition.

Conclusion

It will be far easier to cache the packages for a port with the port itself. To do otherwise creates a mess of problems to solve.

This means each time a new repo is build, the entire ports cache must be cleared. That’s just a bunch of files on disk. mv * ../DELETEME/ && rm -rf ../DELETEME/ will fix that.

Any other ideas

Do you have any ideas? All welcome. Thank you.

Apr 172020
 

In the previous post I documented how the packages_raw table is updated to populate the abi_id and port_id columns. This post will discuss how that data is then pushed into the packages table.

Thoughts

When updates arrived, they arrive in quantity. The latest update was 30474 records. From that, we will have some new package, some will go away, and some will be updates. Many will have no change. I see two approaches:

  1. UPDATE where found and nothing to change, INSERT where not found, DELETE where no longer present.
  2. DELETE everything related to this import, INSERT everything imported.

Option 1 will have far less database churn. Option 2 will maximize it.

Let’s look at option 1 mostly because I like that approach best and it should be straight forward to implement.

That approach also makes me want to keep statistics on package building changes.

UPDATE

I want to update only records which have changes. This works:

SELECT *
  FROM packages_raw PR, packages P
 WHERE P.abi_id           = PR.abi_id
   AND P.package_set      = PR.package_set
   AND P.package_name     = PR.package_name
   AND P.package_version != PR.package_version
   AND PR.abi             = 'FreeBSD:13:amd64'
   AND PR.package_set     = 'latest';

Therefore I only want to update those rows. Let’s try this query:

UPDATE packages p
   SET package_version = PR.package_version
  FROM packages_raw PR
 WHERE P.abi_id           = PR.abi_id
   AND P.package_set      = PR.package_set
   AND P.package_name     = PR.package_name
   AND P.package_version != PR.package_version;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages p  (cost=168509.43..183856.43 rows=392902 width=70) (actual time=11500.885..11500.885 rows=0 loops=1)
   ->  Merge Join  (cost=168509.43..183856.43 rows=392902 width=70) (actual time=7492.742..10230.503 rows=2562 loops=1)
         Merge Cond: ((p.abi_id = pr.abi_id) AND (p.package_set = pr.package_set) AND (p.package_name = pr.package_name))
         Join Filter: (p.package_version <> pr.package_version)
         Rows Removed by Join Filter: 455631
         ->  Sort  (cost=82431.21..83603.67 rows=468984 width=48) (actual time=2378.262..3650.466 rows=459029 loops=1)
               Sort Key: p.abi_id, p.package_set, p.package_name
               Sort Method: external merge  Disk: 26824kB
               ->  Seq Scan on packages p  (cost=0.00..9400.84 rows=468984 width=48) (actual time=0.277..1327.490 rows=459029 loops=1)
         ->  Materialize  (cost=86078.22..88376.89 rows=459733 width=36) (actual time=5114.403..6312.587 rows=459496 loops=1)
               ->  Sort  (cost=86078.22..87227.55 rows=459733 width=36) (actual time=5114.397..6247.665 rows=459496 loops=1)
                     Sort Key: pr.abi_id, pr.package_set, pr.package_name
                     Sort Method: external merge  Disk: 21408kB
                     ->  Seq Scan on packages_raw pr  (cost=0.00..17695.33 rows=459733 width=36) (actual time=0.283..3710.428 rows=459733 loops=1)
 Planning Time: 4.093 ms
 Execution Time: 11501.204 ms
(16 rows)

freshports.dev=# 

By this time it was late and I finished this for the day. Good thing I did…

Let ZFS ARC cache all the database

The next morning, I started looking at shared_buffers and random_page_cost.

I was worried about this update taking 10 seconds, but it does a join across two 450,000 row tables to then update 2500 rows. See https://explain.depesz.com/s/xlWa. I knew I had set this before, so I went searching for values. I checked my production server, where I found it shared_buffers = 128MB. Eventually I found my blog post about caching in ZFS vs caching in PostgreSQL: Website slowness – the investigation.

After setting zfs set primarycache=all, I did the steps found in that post to pre-cache the database:

[dan@x8dtu:~] $ sudo zfs snapshot zroot/data/pg01-postgres@for.caching
[dan@x8dtu:~] $ sudo zfs send zroot/data/pg01-postgres@for.caching > /dev/null
[dan@x8dtu:~] $ sudo zfs destroy zroot/data/pg01-postgres@for.caching
[dan@x8dtu:~] $ 

This server also runs off SSD for which random_page_cost=1.1 is appropriate.

Don’t process the whole table

For all of these operations we must be cautious regarding the contents of packages_raw.
We can never assume that packages_raw contains entries for all ABI and package_set combination already present in the packages table. Rather, it will contain only those sets which were recently imported. If the check and import implemented with sufficient frequency, it will detect new repos immediately and packages_raw will contain data from just one repo.

For this testing, we are using ABI FreeBSD:13:amd64 and package set latest.

The UPDATE: testing with a smaller set and more ZFS caching

Now the results have gone from about 7 or 11 seconds to about 1.5 seconds:

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS) 
UPDATE packages p
   SET package_version = PR.package_version
  FROM packages_raw PR
 WHERE P.abi_id           = PR.abi_id
   AND P.package_set      = PR.package_set
   AND P.package_name     = PR.package_name
   AND P.package_version != PR.package_version
   AND PR.abi             = 'FreeBSD:13:amd64'
   AND PR.package_set     = 'latest';
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages p  (cost=15321.52..26841.38 rows=17989 width=69) (actual time=305.336..305.336 rows=0 loops=1)
   Buffers: shared hit=9313 read=7201 dirtied=1105 written=10
   I/O Timings: read=72.845
   ->  Hash Join  (cost=15321.52..26841.38 rows=17989 width=69) (actual time=234.095..264.130 rows=469 loops=1)
         Hash Cond: ((pr.abi_id = p.abi_id) AND (pr.package_name = p.package_name))
         Join Filter: (p.package_version <> pr.package_version)
         Rows Removed by Join Filter: 30880
         Buffers: shared hit=154 read=6374 dirtied=135
         I/O Timings: read=48.906
         ->  Bitmap Heap Scan on packages_raw pr  (cost=480.97..11900.62 rows=19088 width=36) (actual time=17.957..34.032 rows=31401 loops=1)
               Recheck Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
               Heap Blocks: exact=530
               Buffers: shared read=1668
               I/O Timings: read=17.758
               ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..476.20 rows=19088 width=0) (actual time=17.832..17.832 rows=62784 loops=1)
                     Index Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
                     Buffers: shared read=773
                     I/O Timings: read=10.732
         ->  Hash  (cost=10655.10..10655.10 rows=279030 width=47) (actual time=210.372..210.373 rows=275766 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 26085kB
               Buffers: shared hit=154 read=4706 dirtied=135
               I/O Timings: read=31.148
               ->  Seq Scan on packages p  (cost=0.00..10655.10 rows=279030 width=47) (actual time=0.023..113.982 rows=275766 loops=1)
                     Filter: (package_set = 'latest'::package_sets)
                     Rows Removed by Filter: 183263
                     Buffers: shared hit=154 read=4706 dirtied=135
                     I/O Timings: read=31.148
 Planning Time: 0.583 ms
 Execution Time: 308.639 ms
(29 rows)

freshports.dev=# 

See https://explain.depesz.com/s/N6KT for details.

The INSERT

We have to rows in packages_raw which are not found in packages.

How do we find that data?

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS) 
SELECT *
  FROM packages_raw PR
 WHERE NOT EXISTS (
 SELECT *
   FROM packages P
  WHERE P.abi_id           = PR.abi_id
    AND P.package_set      = PR.package_set
    AND P.package_name     = PR.package_name
    AND P.package_version  = PR.package_version
    AND PR.abi             = 'FreeBSD:13:amd64'
    AND PR.package_set     = 'latest')
 AND PR.abi_id  IS NOT NULL
 AND PR.port_id IS NOT NULL
 AND PR.abi             = 'FreeBSD:13:amd64'
 AND PR.package_set     = 'latest';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1480.76..21125.96 rows=15908 width=79) (actual time=21.782..137.166 rows=503 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=127168
   ->  Nested Loop Anti Join  (cost=480.76..18535.16 rows=6628 width=79) (actual time=13.563..122.878 rows=168 loops=3)
         Join Filter: ((pr.abi = 'FreeBSD:13:amd64'::text) AND (pr.package_set = 'latest'::package_sets))
         Buffers: shared hit=127168
         ->  Parallel Bitmap Heap Scan on packages_raw pr  (cost=480.34..11732.96 rows=6892 width=79) (actual time=9.732..21.140 rows=10461 loops=3)
               Recheck Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
               Filter: ((abi_id IS NOT NULL) AND (port_id IS NOT NULL))
               Rows Removed by Filter: 6
               Heap Blocks: exact=201
               Buffers: shared hit=1668
               ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..476.20 rows=19088 width=0) (actual time=16.749..16.749 rows=62784 loops=1)
                     Index Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
                     Buffers: shared hit=773
         ->  Index Only Scan using packages_all_idx on packages p  (cost=0.42..0.97 rows=1 width=29) (actual time=0.008..0.008 rows=1 loops=31383)
               Index Cond: ((abi_id = pr.abi_id) AND (package_name = pr.package_name) AND (package_set = pr.package_set) AND (package_set = 'latest'::package_sets))
               Filter: (package_version = pr.package_version)
               Rows Removed by Filter: 0
               Heap Fetches: 31349
               Buffers: shared hit=125500
 Planning Time: 1.609 ms
 Execution Time: 137.356 ms
(24 rows)

freshports.dev=# 

The two IS NOT NULL clauses avoids inserting package data for which we did not find a matching port. This is not uncommon on the development server, but will be flagged on the production host.

The INSERT then becomes:

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS) 
INSERT INTO packages (abi_id, port_id, package_version, package_name, package_set)
SELECT abi_id, port_id, package_version, package_name, package_set
  FROM packages_raw PR
 WHERE NOT EXISTS (
 SELECT *
   FROM packages P
  WHERE P.abi_id           = PR.abi_id
    AND P.package_set      = PR.package_set
    AND P.package_name     = PR.package_name
    AND P.package_version  = PR.package_version
    AND PR.abi             = 'FreeBSD:13:amd64'
    AND PR.package_set     = 'latest')
 AND PR.abi_id  IS NOT NULL
 AND PR.port_id IS NOT NULL
 AND PR.abi             = 'FreeBSD:13:amd64'
 AND PR.package_set     = 'latest';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on packages  (cost=480.76..28278.42 rows=15908 width=58) (actual time=341.049..341.050 rows=0 loops=1)
   Buffers: shared hit=133908 read=1 dirtied=828 written=9
   I/O Timings: read=0.048
   ->  Nested Loop Anti Join  (cost=480.76..28278.42 rows=15908 width=58) (actual time=21.860..313.677 rows=503 loops=1)
         Join Filter: ((pr.abi = 'FreeBSD:13:amd64'::text) AND (pr.package_set = 'latest'::package_sets))
         Buffers: shared hit=127747 dirtied=24
         ->  Bitmap Heap Scan on packages_raw pr  (cost=480.34..11899.98 rows=16541 width=51) (actual time=16.643..48.263 rows=31383 loops=1)
               Recheck Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
               Filter: ((abi_id IS NOT NULL) AND (port_id IS NOT NULL))
               Rows Removed by Filter: 18
               Heap Blocks: exact=530
               Buffers: shared hit=1668
               ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..476.20 rows=19088 width=0) (actual time=16.354..16.354 rows=62784 loops=1)
                     Index Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
                     Buffers: shared hit=773
         ->  Index Only Scan using packages_all_idx on packages p  (cost=0.42..0.97 rows=1 width=29) (actual time=0.007..0.007 rows=1 loops=31383)
               Index Cond: ((abi_id = pr.abi_id) AND (package_name = pr.package_name) AND (package_set = pr.package_set) AND (package_set = 'latest'::package_sets))
               Filter: (package_version = pr.package_version)
               Rows Removed by Filter: 0
               Heap Fetches: 31450
               Buffers: shared hit=125576 dirtied=24
 Planning Time: 1.607 ms
 Execution Time: 341.237 ms
(23 rows)

freshports.dev=# 

See https://explain.depesz.com/s/ylCc.

The DELETE

We must be cautious with this query. The goal is to remove entries from the packages table which are not available from the repo.

We want to find entries in the packages table which are not found in the package_raw table, the inverse of what we do for INSERT.

Originally, I thought package_version was required in this SELECT, but that will pick up entries which exist, but only the package version has changed. If we always did the DELETE after the UPDATE, the inclusion of package_version in the comparison would still work, but it is better to not include it.

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM packages P
 WHERE P.abi_id      = 11
   AND P.package_set = 'latest'
   AND NOT EXISTS (
     SELECT *
       FROM packages_raw PR
      WHERE P.abi_id       = PR.abi_id
        AND P.package_set  = PR.package_set
        AND P.package_name = PR.package_name
        AND PR.abi         = 'FreeBSD:13:amd64'
        AND PR.package_set = 'latest');
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on packages p  (cost=3.08..6108.16 rows=19353 width=12) (actual time=52.662..52.662 rows=0 loops=1)
   Buffers: shared hit=35880
   ->  Merge Anti Join  (cost=3.08..6108.16 rows=19353 width=12) (actual time=23.933..52.586 rows=36 loops=1)
         Merge Cond: (p.package_name = pr.package_name)
         Join Filter: ((p.abi_id = pr.abi_id) AND (p.package_set = pr.package_set))
         Buffers: shared hit=35844
         ->  Index Scan using packages_all_idx on packages p  (cost=0.42..6057.09 rows=19354 width=28) (actual time=0.018..15.736 rows=31384 loops=1)
               Index Cond: ((abi_id = 11) AND (package_set = 'latest'::package_sets))
               Buffers: shared hit=27257
         ->  Sort  (cost=2.66..2.66 rows=1 width=29) (actual time=23.518..25.581 rows=31383 loops=1)
               Sort Key: pr.package_name
               Sort Method: quicksort  Memory: 3278kB
               Buffers: shared hit=8587
               ->  Index Scan using packages_raw_all on packages_raw pr  (cost=0.42..2.65 rows=1 width=29) (actual time=0.021..14.889 rows=31383 loops=1)
                     Index Cond: ((abi_id = 11) AND (package_set = 'latest'::package_sets))
                     Filter: (abi = 'FreeBSD:13:amd64'::text)
                     Buffers: shared hit=8587
 Planning Time: 0.421 ms
 Execution Time: 52.712 ms
(19 rows)

See https://explain.depesz.com/s/ysWG for details.

Putting it all into one function

This is the function, complete with stats updates.

CREATE OR REPLACE FUNCTION UpdatePackagesFromRawPackages( IN a_abi_name text, IN a_package_set package_sets)
    RETURNS TABLE(inserts integer, updates integer, deletes integer)
    LANGUAGE plpgsql VOLATILE
    AS $$
  DECLARE
    l_abi_id  integer;
    l_inserts integer;
    l_deletes integer;
    l_updates integer;
  BEGIN
    SELECT id
      INTO l_abi_id
      FROM abi
     WHERE name = a_abi_name;

    IF NOT FOUND THEN
        RAISE EXCEPTION 'cannot find id for ABI %', a_abi_name;
    END IF;

    UPDATE packages p
       SET package_version = PR.package_version
      FROM packages_raw PR
     WHERE P.abi_id           = PR.abi_id
       AND P.package_set      = PR.package_set
       AND P.package_name     = PR.package_name
       AND P.package_version != PR.package_version
       AND PR.abi             = a_abi_name
       AND PR.package_set     = a_package_set;

    GET DIAGNOSTICS l_updates = ROW_COUNT;

    INSERT INTO packages (abi_id, port_id, package_version, package_name, package_set)
    SELECT abi_id, port_id, package_version, package_name, package_set
      FROM packages_raw PR
     WHERE NOT EXISTS (
     SELECT *
       FROM packages P
      WHERE P.abi_id           = PR.abi_id
        AND P.package_set      = PR.package_set
        AND P.package_name     = PR.package_name
        AND P.package_version  = PR.package_version
        AND PR.abi             = a_abi_name
        AND PR.package_set     = a_package_set)
     AND PR.abi_id  IS NOT NULL
     AND PR.port_id IS NOT NULL
     AND PR.abi             = a_abi_name
     AND PR.package_set     = a_package_set;

    GET DIAGNOSTICS l_inserts = ROW_COUNT;

    DELETE FROM packages P
     WHERE P.abi_id      = l_abi_id
       AND P.package_set = a_package_set
       AND NOT EXISTS (
         SELECT *
           FROM packages_raw PR
          WHERE P.abi_id       = PR.abi_id
            AND P.package_set  = PR.package_set
            AND P.package_name = PR.package_name
            AND PR.abi         = a_abi_name
            AND PR.package_set = a_package_set);

    GET DIAGNOSTICS l_deletes = ROW_COUNT;

    INSERT INTO package_imports (abi_id, package_set, date, inserts, updates, deletes)
                         values (l_abi_id, a_package_set, CURRENT_TIMESTAMP, l_inserts, l_updates, l_deletes);

    RETURN QUERY (SELECT l_inserts, l_updates, l_deletes);

  END;
$$;


And it seems to work as expected. More testing to follow.

freshports.dev=# begin; select * from UpdatePackagesFromRawPackages('FreeBSD:13:amd64', 'latest');
BEGIN
 inserts | updates | deletes 
---------+---------+---------
      35 |     469 |      36
(1 row)

freshports.dev=# select * from package_imports;
 id | abi_id | package_set |        date        | inserts | updates | deletes 
----+--------+-------------+--------------------+---------+---------+---------
  1 |     11 | latest      | 20:43:46.194598+00 |      35 |     469 |      36
(1 row)

freshports.dev=# rollback;

That is a total of 540 records updates, out of 30,474 records imported. That’s far better than deleting and inserting all that data.

More work on this over the weekend.

Apr 162020
 

This post is another in the packages series. It documents how the data is transformed from the raw form (pacakges_raw) into normalized data (packages).

The Packages – how is this data stored? post may be useful reading.

The primary purpose of this blog post is documentation of the above mentioned transformation process. Along the way I found a performance enhancement owing to a deferred constraint. That’s why this post is tagged with PostgreSQL.

Some data background

When data is copied into the package_raw table, two columns are left empty:

  1. abi_id
  2. port_id

Those values can be determined based on columns which are supplied:

  1. abi
  2. package_origin

Here is some sample data:

freshports.dev=# SELECT * FROM packages_raw WHERE abi = 'FreeBSD:13:i386' AND package_set = 'latest' ORDER BY package_origin LIMIT 10;
   id    |         package_origin         |   package_name   | package_version |       abi       | abi_id | port_id | package_set 
---------+--------------------------------+------------------+-----------------+-----------------+--------+---------+-------------
 2749964 | accessibility/accerciser       | accerciser       | 3.22.0          | FreeBSD:13:i386 |        |         | latest
 2749124 | accessibility/at-spi2-atk      | at-spi2-atk      | 2.26.2          | FreeBSD:13:i386 |        |         | latest
 2749119 | accessibility/at-spi2-core     | at-spi2-core     | 2.28.0          | FreeBSD:13:i386 |        |         | latest
 2749003 | accessibility/atk              | atk              | 2.28.1          | FreeBSD:13:i386 |        |         | latest
 2748993 | accessibility/atkmm            | atkmm            | 2.24.2_4        | FreeBSD:13:i386 |        |         | latest
 2747796 | accessibility/caribou          | caribou          | 0.4.21_2        | FreeBSD:13:i386 |        |         | latest
 2760503 | accessibility/eflite           | eflite           | 0.4.1           | FreeBSD:13:i386 |        |         | latest
 2744976 | accessibility/kdeaccessibility | kdeaccessibility | 19.12.3         | FreeBSD:13:i386 |        |         | latest
 2756892 | accessibility/kmag             | kmag             | 19.12.3         | FreeBSD:13:i386 |        |         | latest
 2756793 | accessibility/kmousetool       | kmousetool       | 19.12.3_1       | FreeBSD:13:i386 |        |         | latest
(10 rows)

freshports.dev=# 

This date will eventually be copied into the packages table, but we need to populate the two empty columns first.

What should the data look like?

My first goal was to create a result set resembled the desired outcome.

        SELECT P.id,
               abi.name,
               abi.id AS abi_id,
               P.package_origin,
               PO.port_id,
               P.package_name,
               P.package_version,
               P.package_set
          FROM packages_raw P 
          JOIN abi on P.abi = abi.name
          JOIN ports_origin PO on PO.port_origin = P.package_origin
         WHERE P.abi         = 'FreeBSD:13:i386'
           AND P.package_set = 'latest'
      ORDER BY package_origin 
         LIMIT 10;
   id    |      name       | abi_id |         package_origin         | port_id |   package_name   | package_version | package_set 
---------+-----------------+--------+--------------------------------+---------+------------------+-----------------+-------------
 2749964 | FreeBSD:13:i386 |     10 | accessibility/accerciser       |   21760 | accerciser       | 3.22.0          | latest
 2749124 | FreeBSD:13:i386 |     10 | accessibility/at-spi2-atk      |   26871 | at-spi2-atk      | 2.26.2          | latest
 2749119 | FreeBSD:13:i386 |     10 | accessibility/at-spi2-core     |   26868 | at-spi2-core     | 2.28.0          | latest
 2749003 | FreeBSD:13:i386 |     10 | accessibility/atk              |   11601 | atk              | 2.28.1          | latest
 2748993 | FreeBSD:13:i386 |     10 | accessibility/atkmm            |   27818 | atkmm            | 2.24.2_4        | latest
 2747796 | FreeBSD:13:i386 |     10 | accessibility/caribou          |   35704 | caribou          | 0.4.21_2        | latest
 2760503 | FreeBSD:13:i386 |     10 | accessibility/eflite           |   22197 | eflite           | 0.4.1           | latest
 2744976 | FreeBSD:13:i386 |     10 | accessibility/kdeaccessibility |   11726 | kdeaccessibility | 19.12.3         | latest
 2756892 | FreeBSD:13:i386 |     10 | accessibility/kmag             |   31155 | kmag             | 19.12.3         | latest
 2756793 | FreeBSD:13:i386 |     10 | accessibility/kmousetool       |   31156 | kmousetool       | 19.12.3_1       | latest
(10 rows)

freshports.dev=# 

The ports_orgin is a new table created specifically for this query. Until now, there has been no need to obtain a port_id based upon its origin. This table will be updated via triggers upon the ports table. That bit of work has been deferred for now.

freshports.dev=# \d ports_origin
              Table "public.ports_origin"
   Column    |  Type   | Collation | Nullable | Default 
-------------+---------+-----------+----------+---------
 port_id     | integer |           | not null | 
 port_origin | text    |           | not null | 
Indexes:
    "port_origin_pkey" PRIMARY KEY, btree (port_id)
    "ports_origin_port_id_idx" btree (port_id)
    "ports_origin_port_origin_idx" btree (port_origin)
Foreign-key constraints:
    "ports_origin_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE NOT VALID

The original test

The original test took 18 seconds to populate about 30,000 rows:

explain analyse
UPDATE packages_raw PR
   SET abi_id  = pkg.abi_id,
       port_id = pkg.port_id
  FROM (SELECT P.id,
               abi.name,
               abi.id AS abi_id,
               P.package_origin,
               PO.port_id,
               P.package_name,
               P.package_version,
               P.package_set
          FROM packages_raw P 
          JOIN abi on P.abi = abi.name
          JOIN ports_origin PO on PO.port_origin = P.package_origin
         WHERE P.abi        = 'FreeBSD:13:i386'
           AND P.package_set = 'latest') AS pkg
 WHERE PR.id = pkg.id;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_raw pr  (cost=11144.47..40827.87 rows=30175 width=115) (actual time=11081.844..11081.844 rows=0 loops=1)
   ->  Hash Join  (cost=11144.47..40827.87 rows=30175 width=115) (actual time=2880.504..4680.566 rows=38905 loops=1)
         Hash Cond: (p.package_origin = po.port_origin)
         ->  Nested Loop  (cost=9051.81..36627.53 rows=22050 width=113) (actual time=2659.534..3858.975 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.344..0.351 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Hash Join  (cost=9051.81..36405.99 rows=22050 width=120) (actual time=2659.181..3844.350 rows=30492 loops=1)
                     Hash Cond: (pr.id = p.id)
                     ->  Seq Scan on packages_raw pr  (cost=0.00..12452.32 rows=524132 width=77) (actual time=0.370..2365.531 rows=459733 loops=1)
                     ->  Hash  (cost=8560.18..8560.18 rows=22050 width=51) (actual time=143.257..143.258 rows=30492 loops=1)
                           Buckets: 16384  Batches: 4  Memory Usage: 767kB
                           ->  Bitmap Heap Scan on packages_raw p  (cost=1018.44..8560.18 rows=22050 width=51) (actual time=86.264..117.885 rows=30492 loops=1)
                                 Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                                 Heap Blocks: exact=416
                                 ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..1012.92 rows=22050 width=0) (actual time=85.918..85.918 rows=66670 loops=1)
                                       Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=220.029..220.029 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.458..172.482 rows=57807 loops=1)
 Planning Time: 7.125 ms
 Trigger for constraint packages_raw_abi_id_fk: time=1151.252 calls=30474
 Trigger for constraint packages_raw_port_id_fk: time=6532.501 calls=30474
 Execution Time: 18782.602 ms

See https://explain.depesz.com/s/6d9Z for more detail.

I noticed all the time spent checking foreign keys while updating 30474 rows. I figured I could save perhaps most of that nearly 7 seconds spent on constraint.

I thought we could defer that. I changed the packages_raw_port_id_fk constraint to be DEFERRABLE INITIALLY DEFERRED. I also added ON UPDATE CASCADE and ON DELETE CASCADE which should have been there from the start (but that’s not relevant to the change in performance).

Test number 2

On the second try, we get:

Update on packages_raw pr  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=12221.244..12221.244 rows=0 loops=1)
   ->  Hash Join  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=2772.498..6044.554 rows=38905 loops=1)
         Hash Cond: (p.package_origin = po.port_origin)
         ->  Nested Loop  (cost=10845.18..34762.92 rows=14998 width=113) (actual time=2552.187..5468.500 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.335..0.338 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Hash Join  (cost=10845.18..34611.91 rows=14998 width=120) (actual time=2551.846..5455.815 rows=30492 loops=1)
                     Hash Cond: (pr.id = p.id)
                     ->  Seq Scan on packages_raw pr  (cost=0.00..12835.28 rows=384928 width=77) (actual time=0.337..2901.118 rows=459733 loops=1)
                     ->  Hash  (cost=10510.70..10510.70 rows=14998 width=51) (actual time=171.243..171.244 rows=30492 loops=1)
                           Buckets: 16384 (originally 16384)  Batches: 4 (originally 2)  Memory Usage: 897kB
                           ->  Bitmap Heap Scan on packages_raw p  (cost=978.15..10510.70 rows=14998 width=51) (actual time=100.492..143.901 rows=30492 loops=1)
                                 Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                                 Heap Blocks: exact=416
                                 ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..974.40 rows=14998 width=0) (actual time=100.172..100.173 rows=60966 loops=1)
                                       Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=219.460..219.460 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.459..172.605 rows=57807 loops=1)
 Planning Time: 9.476 ms
 Trigger for constraint packages_raw_abi_id_fk: time=363.506 calls=30474
 Execution Time: 12589.008 ms

Roughly 5 seconds of savings. See https://explain.depesz.com/s/Y9gj.

OK, let’s do the same for abi_id.

3rd time

After setting packages_raw_abi_id_fk to DEFERRABLE INITIALLY DEFERRED, the 3rd test results were:

Update on packages_raw pr  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=8798.389..8798.389 rows=0 loops=1)
   ->  Hash Join  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=2841.848..4960.062 rows=38905 loops=1)
         Hash Cond: (p.package_origin = po.port_origin)
         ->  Nested Loop  (cost=10845.18..34762.92 rows=14998 width=113) (actual time=2620.392..3970.915 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.343..0.350 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Hash Join  (cost=10845.18..34611.91 rows=14998 width=120) (actual time=2620.040..3956.973 rows=30492 loops=1)
                     Hash Cond: (pr.id = p.id)
                     ->  Seq Scan on packages_raw pr  (cost=0.00..12835.28 rows=384928 width=77) (actual time=0.340..2796.802 rows=459733 loops=1)
                     ->  Hash  (cost=10510.70..10510.70 rows=14998 width=51) (actual time=167.195..167.196 rows=30492 loops=1)
                           Buckets: 16384 (originally 16384)  Batches: 4 (originally 2)  Memory Usage: 897kB
                           ->  Bitmap Heap Scan on packages_raw p  (cost=978.15..10510.70 rows=14998 width=51) (actual time=110.897..140.914 rows=30492 loops=1)
                                 Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                                 Heap Blocks: exact=416
                                 ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..974.40 rows=14998 width=0) (actual time=110.419..110.420 rows=91440 loops=1)
                                       Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=220.662..220.662 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.446..173.184 rows=57807 loops=1)
 Planning Time: 4.560 ms
 Execution Time: 8800.805 ms

That’s nearly 10 seconds faster. Sure, there might be some caching involved but the time spent checking foreign keys is greatly reduced.

See https://explain.depesz.com/s/XXnj.

Next, I noticed the sequential scan on ports_origin. Let’s see about that.

Yeah, that self-join

The self-join was noticed by xocolatl as being unnecessary. Let’s remove that to get:

UPDATE packages_raw PR
   SET abi_id  = abi.id,
       port_id = PO.port_id
  FROM abi, ports_origin PO
 WHERE PR.abi         = 'FreeBSD:13:i386'
   AND PR.package_set = 'latest'
   AND PR.abi         = abi.name
   AND PO.port_origin = PR.package_origin;

                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_raw pr  (cost=3081.65..12658.10 rows=21940 width=109) (actual time=6183.437..6183.437 rows=0 loops=1)
   ->  Hash Join  (cost=3081.65..12658.10 rows=21940 width=109) (actual time=418.091..845.851 rows=38905 loops=1)
         Hash Cond: (pr.package_origin = po.port_origin)
         ->  Nested Loop  (cost=989.00..9048.43 rows=16056 width=87) (actual time=174.079..268.378 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.030..0.036 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Bitmap Heap Scan on packages_raw pr  (cost=989.00..8886.84 rows=16056 width=77) (actual time=174.032..256.085 rows=30492 loops=1)
                     Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                     Heap Blocks: exact=416
                     ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..984.98 rows=16056 width=0) (actual time=173.458..173.458 rows=121903 loops=1)
                           Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=243.062..243.062 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.451..195.454 rows=57807 loops=1)
 Planning Time: 2.922 ms
 Execution Time: 6186.449 ms
(17 rows)

Well, isn’t that better?

See https://explain.depesz.com/s/cnyw for details.

That sequential scan on ports_origin takes up 195ms. Nothing to worry about.

Things to follow up on

This query doesn’t find values for all the FreeBSD:13:i386 * latest combination. Some are not matched.

freshports.dev=# SELECT * FROM packages_raw WHERE port_id IS NULL AND abi = 'FreeBSD:13:i386' AND package_set = 'latest' ORDER BY package_origin;
   id    |            package_origin            |          package_name          | package_version |       abi       | abi_id | port_id | package_set 
---------+--------------------------------------+--------------------------------+-----------------+-----------------+--------+---------+-------------
 2745970 | databases/rubygem-peek-mysql2-rails5 | rubygem-peek-mysql2-rails5     | 1.2.0           | FreeBSD:13:i386 |        |         | latest
 2740242 | devel/p5-File-ShareDir-Tarball       | p5-File-ShareDir-Tarball       | 0.2.2           | FreeBSD:13:i386 |        |         | latest
 2741383 | devel/p5-MooX-Singleton              | p5-MooX-Singleton              | 1.20            | FreeBSD:13:i386 |        |         | latest
 2740542 | devel/p5-MooseX-MungeHas             | p5-MooseX-MungeHas             | 0.011           | FreeBSD:13:i386 |        |         | latest
 2733133 | devel/py-aspy.yaml                   | py27-aspy.yaml                 | 1.3.0           | FreeBSD:13:i386 |        |         | latest
 2741040 | devel/py-aspy.yaml                   | py37-aspy.yaml                 | 1.3.0           | FreeBSD:13:i386 |        |         | latest
 2743867 | devel/py-nodeenv                     | py27-nodeenv                   | 1.3.3           | FreeBSD:13:i386 |        |         | latest
 2744182 | devel/py-nodeenv                     | py37-nodeenv                   | 1.3.3           | FreeBSD:13:i386 |        |         | latest
 2746002 | devel/rubygem-peek-gc-rails5         | rubygem-peek-gc-rails5         | 0.0.2           | FreeBSD:13:i386 |        |         | latest
 2745932 | devel/rubygem-peek-rails5            | rubygem-peek-rails5            | 1.0.1           | FreeBSD:13:i386 |        |         | latest
 2745897 | devel/rubygem-peek-rblineprof-rails5 | rubygem-peek-rblineprof-rails5 | 0.2.0           | FreeBSD:13:i386 |        |         | latest
 2745876 | devel/rubygem-peek-redis-rails5      | rubygem-peek-redis-rails5      | 1.2.0           | FreeBSD:13:i386 |        |         | latest
 2749944 | security/picosha2                    | picosha2                       | 1.0.0.23        | FreeBSD:13:i386 |        |         | latest
 2749043 | sysutils/atf-rk3399                  | atf-rk3399                     | v2.1            | FreeBSD:13:i386 |        |         | latest
 2746664 | sysutils/py-leviathan                | py27-leviathan                 | 0.1.1           | FreeBSD:13:i386 |        |         | latest
 2746885 | sysutils/py-leviathan                | py37-leviathan                 | 0.1.1           | FreeBSD:13:i386 |        |         | latest
 2737873 | sysutils/swapusage                   | swapusage                      | 1.0.0           | FreeBSD:13:i386 |        |         | latest
 2744052 | www/rubygem-gon-rails5               | rubygem-gon-rails5             | 6.2.1           | FreeBSD:13:i386 |        |         | latest
(18 rows)

This is a development database and those ports do not exist on this host. It could also be an out-of-date ports_origin table, but I haven’t checked that yet. All of those ports were added to the FreeBSD repo on these dates:

  1. February 5 2019
  2. April 15 2019
  3. May 26-29 2019

They are present in the production database. Nothing to be done here.

However, the system will have to flag such entries for later followup.

Apr 142020
 

This is the latest in a series of posts dealing with displaying what packages are available for a given FreeBSD port.

The Packages – how is this data stored? post may help you follow along. It outlines the data and the tables used in this post.

Last night I managed to get the right data out for ports which have multiple flavors. I used a few procedures each of one calling another, eventually getting the results.

That effort has now been replaced with a single query.

In this post, we will be looking at the www/py-django-storages port, which has three flavors.

The query

This query came to me from RhodiumToad, who has a wide breadth of knowledge and is liberal with his help. What you see below has been adjusted since then, but it closely represents his delivery.

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

This blog post is mainly to document the above and go through it step by step. I’ll need this later because I know I won’t remember it. I also want to learn more about this query and apply it to future development.

I’m going to step through the query, outlining what each part does.

WITH pkg

A WITH query “provides a way to write auxiliary statements for use in a larger query”.

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

I think of it as a temporary table for this query. It is defining the data which will be used by the rest of the query.

I will refer to this later as the pkg table.

This is what we get:

freshports.dev=# SELECT * FROM packages WHERE port_id = 28303 ORDER BY package_name;
   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------
 4373305 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4369003 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4340907 |      7 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4392074 |     10 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4388153 |     11 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4346130 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4382725 |      2 |   28303 | 1.8             | py27-django-storages |           |             | latest
 4349958 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4353741 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4380869 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4376671 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4357905 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4339390 |      7 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4361606 |      9 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4363708 |      9 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4369021 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4373197 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4365567 |      9 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4357806 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4376807 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4380779 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4353810 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4349931 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4383395 |      2 |   28303 | 1.8             | py37-django-storages |           |             | latest
 4388136 |     11 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4346104 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4392033 |     10 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4342889 |      7 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
(30 rows)

freshports.dev=# 

I sorted the data just to make it easier to follow.

The above represents all the packages available for www/py-django-storages from the FreeBSD repos.

How many different builds do we have for each package_name?

  SELECT package_name, 
         count(package_name) 
    FROM packages 
   WHERE port_id = 28303 
GROUP BY package_name 
ORDER BY package_name;

That query produces this result:

     package_name     | count 
----------------------+-------
 py27-django-storages |    15
 py36-django-storages |     2
 py37-django-storages |    13
(3 rows)

Those numbers will become relevant later.

SELECT DISTINCT

SELECT DISTINCT eliminates duplicate rows from the result. Don’t do what I initially did and use brackets around your first item. Consider SELECT DISTINCT to be a single keyword. It is not a function.

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

We need results for all packages on all ABI. To get there, we need a list of the packages first.

Combining that SELECT DISTINCT into the WITH clause, we get:

WITH pkg AS
  (SELECT * FROM packages WHERE port_id = 28303)
SELECT DISTINCT package_name FROM pkg;

     package_name     
----------------------
 py27-django-storages
 py36-django-storages
 py37-django-storages
(3 rows)

This is the pkg table.

All ABI * package combinations

We want to show people that a package is not available on a given ABI. We also want all tables to have the same number of rows so that scanning across the page compares like-to-like. For that, we want all combinations please.

abi CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn

Here is a list of all packages beside all ABI:

WITH pkg AS
  (SELECT * FROM packages WHERE port_id = 28303)
SELECT * FROM abi CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn

 id |        name        | active |     package_name     
----+--------------------+--------+----------------------
  1 | FreeBSD:12:amd64   | t      | py27-django-storages
  2 | FreeBSD:13:aarch64 | t      | py27-django-storages
  4 | FreeBSD:11:i386    | t      | py27-django-storages
  6 | FreeBSD:11:amd64   | t      | py27-django-storages
  7 | FreeBSD:11:aarch64 | t      | py27-django-storages
  8 | FreeBSD:12:i386    | t      | py27-django-storages
  9 | FreeBSD:12:aarch64 | t      | py27-django-storages
 10 | FreeBSD:13:i386    | t      | py27-django-storages
 11 | FreeBSD:13:amd64   | t      | py27-django-storages
  1 | FreeBSD:12:amd64   | t      | py36-django-storages
  2 | FreeBSD:13:aarch64 | t      | py36-django-storages
  4 | FreeBSD:11:i386    | t      | py36-django-storages
  6 | FreeBSD:11:amd64   | t      | py36-django-storages
  7 | FreeBSD:11:aarch64 | t      | py36-django-storages
  8 | FreeBSD:12:i386    | t      | py36-django-storages
  9 | FreeBSD:12:aarch64 | t      | py36-django-storages
 10 | FreeBSD:13:i386    | t      | py36-django-storages
 11 | FreeBSD:13:amd64   | t      | py36-django-storages
  1 | FreeBSD:12:amd64   | t      | py37-django-storages
  2 | FreeBSD:13:aarch64 | t      | py37-django-storages
  4 | FreeBSD:11:i386    | t      | py37-django-storages
  6 | FreeBSD:11:amd64   | t      | py37-django-storages
  7 | FreeBSD:11:aarch64 | t      | py37-django-storages
  8 | FreeBSD:12:i386    | t      | py37-django-storages
  9 | FreeBSD:12:aarch64 | t      | py37-django-storages
 10 | FreeBSD:13:i386    | t      | py37-django-storages
 11 | FreeBSD:13:amd64   | t      | py37-django-storages
(27 rows)

With 9 rows in the ABI table and 3 distinct package names we get 27 rows.

I will refer to this as the abi_x_package_name data.

LEFT JOIN

We have a list of all the

Now we do a LEFT JOIN. That gets us all the abi_x_package_name combinations we need with, or without, the corresponding package information.

LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)

This takes the entries from pkg and joins it with everything from the abi table.

This takes us to the entire query now:

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

        abi         | abi_id |     package_name     | package_set | package_version 
--------------------+--------+----------------------+-------------+-----------------
 FreeBSD:11:aarch64 |      7 | py27-django-storages | latest      | 1.5.1
 FreeBSD:11:aarch64 |      7 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | latest      | 1.5.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | latest      | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:13:aarch64 |      2 | py27-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py27-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:aarch64 |      7 | py36-django-storages | latest      | 1.5.1
 FreeBSD:11:amd64   |      6 | py36-django-storages |             | 
 FreeBSD:11:i386    |      4 | py36-django-storages |             | 
 FreeBSD:12:aarch64 |      9 | py36-django-storages | latest      | 1.5.1
 FreeBSD:12:amd64   |      1 | py36-django-storages |             | 
 FreeBSD:12:i386    |      8 | py36-django-storages |             | 
 FreeBSD:13:aarch64 |      2 | py36-django-storages |             | 
 FreeBSD:13:amd64   |     11 | py36-django-storages |             | 
 FreeBSD:13:i386    |     10 | py36-django-storages |             | 
 FreeBSD:11:aarch64 |      7 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:13:aarch64 |      2 | py37-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py37-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py37-django-storages | latest      | 1.9.1
(37 rows)

freshports.dev=#

From here, we start pivoting the data so we have latest and quarterly on the same line.

Pivot

By now, we have gone through the entire query. Next is the pivot.

WITH
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi,
       pn.package_name,
       max(pkg.package_version) FILTER (WHERE pkg.package_set = 'latest')
         AS package_version_latest,
       max(pkg.package_version) FILTER (WHERE pkg.package_set = 'quarterly')
         AS package_version_quarterly
  FROM abi
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg ON (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 GROUP BY pn.package_name, abi.name
 ORDER BY pn.package_name, abi.name;

The filter controls which input rows are fed to the aggregate function max. I could have use min(), or any other similar function here because there is only one value which will match it. The GROUP BY clause is what controls this. For each package_name and abi.name combination, there will be 1 or two lines.

The output is the expected 27 lines of information:

        abi         |     package_name     | package_version_latest | package_version_quarterly 
--------------------+----------------------+------------------------+---------------------------
 FreeBSD:11:aarch64 | py27-django-storages | 1.5.1                  | 1.9.1
 FreeBSD:11:amd64   | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:11:i386    | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:aarch64 | py27-django-storages | 1.5.1                  | 1.9.1
 FreeBSD:12:amd64   | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:i386    | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:13:aarch64 | py27-django-storages | 1.8                    | 
 FreeBSD:13:amd64   | py27-django-storages | 1.9.1                  | 
 FreeBSD:13:i386    | py27-django-storages | 1.9.1                  | 
 FreeBSD:11:aarch64 | py36-django-storages | 1.5.1                  | 
 FreeBSD:11:amd64   | py36-django-storages |                        | 
 FreeBSD:11:i386    | py36-django-storages |                        | 
 FreeBSD:12:aarch64 | py36-django-storages | 1.5.1                  | 
 FreeBSD:12:amd64   | py36-django-storages |                        | 
 FreeBSD:12:i386    | py36-django-storages |                        | 
 FreeBSD:13:aarch64 | py36-django-storages |                        | 
 FreeBSD:13:amd64   | py36-django-storages |                        | 
 FreeBSD:13:i386    | py36-django-storages |                        | 
 FreeBSD:11:aarch64 | py37-django-storages |                        | 1.9.1
 FreeBSD:11:amd64   | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:11:i386    | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:aarch64 | py37-django-storages |                        | 1.9.1
 FreeBSD:12:amd64   | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:i386    | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:13:aarch64 | py37-django-storages | 1.8                    | 
 FreeBSD:13:amd64   | py37-django-storages | 1.9.1                  | 
 FreeBSD:13:i386    | py37-django-storages | 1.9.1                  | 
(27 rows)

Why 27? 3 package names, 9 ABI.

Much better

This is a much better approach than the procedures I used before. I like it.

Apr 142020
 

This post documents what I recently put into a gist on Github. It should serve as a rough introduction to how FreshPorts handles packages extracted from packagesite.yaml files.

All the platforms

These are the plaforms (also known as ABI) for which we are monitoring packages created by the FreeBSD project.

They relate to URLs found here: https://pkg.freebsd.org

For each ABI two repos are built, each based off a different code repository:

* latest (head)
* quarterly (now 2020Q2)

Quarterly branches are designed to “provide users with a more predictable and stable experience for port and package installation and upgrades. This is done essentially by only allowing non-feature updates.”

This is the abi data:

freshports.dev=# select * from abi;
 id |        name        | active 
----+--------------------+--------
  1 | FreeBSD:12:amd64   | t
  2 | FreeBSD:13:aarch64 | t
  4 | FreeBSD:11:i386    | t
  6 | FreeBSD:11:amd64   | t
  7 | FreeBSD:11:aarch64 | t
  8 | FreeBSD:12:i386    | t
  9 | FreeBSD:12:aarch64 | t
 10 | FreeBSD:13:i386    | t
 11 | FreeBSD:13:amd64   | t
(9 rows)

freshports.dev=# \d abi
                           Table "public.abi"
 Column |  Type   | Collation | Nullable |           Default            
--------+---------+-----------+----------+------------------------------
 id     | integer |           | not null | generated always as identity
 name   | text    |           | not null | 
 active | boolean |           | not null | true
Indexes:
    "abi_pkey" PRIMARY KEY, btree (id)
    "abi_name_idx" UNIQUE, btree (name)
Referenced by:
    TABLE "packages" CONSTRAINT "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    TABLE "packages_last_checked" CONSTRAINT "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)

freshports.dev=# 

What packages exist for a given port?

The packages table contains information taken from packagesite.yaml found at https://pkg.freebsd.org

Both latest and quarterly are shown.

For this example, we look at the www/py-django-storages port:

This port has three different packges avaiable:

* py27-django-storages
* py36-django-storages
* py37-django-storages

This is not unusual. Many ports have flavors.

For each package_name, I need an outer join from the data below, with the table above (ABI). All ABI values need to
be included for a given package.

The absence of a package for a given ABI is information presented to the user.

freshports.dev=# select * from packages where port_id = 28303;
   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4339390 |      7 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4340907 |      7 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4342889 |      7 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4346104 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4346130 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4349931 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4349958 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4353741 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4353810 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4357806 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4357905 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4361606 |      9 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4363708 |      9 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4365567 |      9 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4369003 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4369021 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4373197 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4373305 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4376671 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4376807 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4380779 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4380869 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4382725 |      2 |   28303 | 1.8             | py27-django-storages |           |             | latest
 4383395 |      2 |   28303 | 1.8             | py37-django-storages |           |             | latest
 4388136 |     11 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4388153 |     11 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4392033 |     10 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4392074 |     10 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
(30 rows)

freshports.dev=# \d packages
                               Table "public.packages"
     Column      |     Type     | Collation | Nullable |           Default            
-----------------+--------------+-----------+----------+------------------------------
 id              | bigint       |           | not null | generated always as identity
 abi_id          | integer      |           | not null | 
 port_id         | integer      |           | not null | 
 package_version | text         |           | not null | 
 package_name    | text         |           | not null | 
 branch_id       | integer      |           |          | 
 branch_name     | text         |           |          | 
 package_set     | package_sets |           |          | 
Indexes:
    "packages_pkey" PRIMARY KEY, btree (id)
    "fki_packages_abi_id_fk" btree (abi_id)
    "fki_packages_port_id_fk" btree (port_id)
    "packages_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    "packages_branch_id_fk" FOREIGN KEY (branch_id) REFERENCES system_branch(id) NOT VALID
    "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID

freshports.dev=# 


freshports.dev=# 

What package is on what ABI?

This is a sample OUTER JOIN to get the required data for a single package.

freshports.dev=# SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py36-django-storages') AS P ON abi.id = P.abi_id
;
   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set | id |        name        | active 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------+----+--------------------+--------
         |        |         |                 |                      |           |             |             |  1 | FreeBSD:12:amd64   | t
         |        |         |                 |                      |           |             |             |  2 | FreeBSD:13:aarch64 | t
         |        |         |                 |                      |           |             |             |  4 | FreeBSD:11:i386    | t
         |        |         |                 |                      |           |             |             |  6 | FreeBSD:11:amd64   | t
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  7 | FreeBSD:11:aarch64 | t
         |        |         |                 |                      |           |             |             |  8 | FreeBSD:12:i386    | t
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  9 | FreeBSD:12:aarch64 | t
         |        |         |                 |                      |           |             |             | 10 | FreeBSD:13:i386    | t
         |        |         |                 |                      |           |             |             | 11 | FreeBSD:13:amd64   | t
(9 rows)

freshports.dev=# 

What does a full set for one port look like?

This repeats the above OUTER JOIN but for all three package names

SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py27-django-storages') AS P ON abi.id = P.abi_id
UNION
SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py36-django-storages') AS P ON abi.id = P.abi_id
UNION
SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py37-django-storages') AS P ON abi.id = P.abi_id;

   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set | id |        name        | active 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------+----+--------------------+--------
 4380779 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  8 | FreeBSD:12:i386    | t
 4373197 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  1 | FreeBSD:12:amd64   | t
         |        |         |                 |                      |           |             |             |  8 | FreeBSD:12:i386    | t
 4369003 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  1 | FreeBSD:12:amd64   | t
 4340907 |      7 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  7 | FreeBSD:11:aarch64 | t
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  7 | FreeBSD:11:aarch64 | t
 4357905 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  4 | FreeBSD:11:i386    | t
         |        |         |                 |                      |           |             |             |  1 | FreeBSD:12:amd64   | t
 4339390 |      7 |   28303 | 1.5.1           | py27-django-storages |           |             | latest      |  7 | FreeBSD:11:aarch64 | t
 4376671 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  8 | FreeBSD:12:i386    | t
 4383395 |      2 |   28303 | 1.8             | py37-django-storages |           |             | latest      |  2 | FreeBSD:13:aarch64 | t
 4376807 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  8 | FreeBSD:12:i386    | t
 4357806 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  4 | FreeBSD:11:i386    | t
 4363708 |      9 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  9 | FreeBSD:12:aarch64 | t
         |        |         |                 |                      |           |             |             | 10 | FreeBSD:13:i386    | t
 4365567 |      9 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  9 | FreeBSD:12:aarch64 | t
 4361606 |      9 |   28303 | 1.5.1           | py27-django-storages |           |             | latest      |  9 | FreeBSD:12:aarch64 | t
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  9 | FreeBSD:12:aarch64 | t
         |        |         |                 |                      |           |             |             |  4 | FreeBSD:11:i386    | t
 4388153 |     11 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      | 11 | FreeBSD:13:amd64   | t
 4392074 |     10 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      | 10 | FreeBSD:13:i386    | t
 4349958 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  6 | FreeBSD:11:amd64   | t
 4382725 |      2 |   28303 | 1.8             | py27-django-storages |           |             | latest      |  2 | FreeBSD:13:aarch64 | t
 4346130 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  6 | FreeBSD:11:amd64   | t
         |        |         |                 |                      |           |             |             | 11 | FreeBSD:13:amd64   | t
 4353741 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  4 | FreeBSD:11:i386    | t
 4392033 |     10 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      | 10 | FreeBSD:13:i386    | t
 4353810 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  4 | FreeBSD:11:i386    | t
 4349931 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  6 | FreeBSD:11:amd64   | t
 4342889 |      7 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  7 | FreeBSD:11:aarch64 | t
 4369021 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  1 | FreeBSD:12:amd64   | t
         |        |         |                 |                      |           |             |             |  6 | FreeBSD:11:amd64   | t
 4373305 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  1 | FreeBSD:12:amd64   | t
 4388136 |     11 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      | 11 | FreeBSD:13:amd64   | t
 4346104 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  6 | FreeBSD:11:amd64   | t
         |        |         |                 |                      |           |             |             |  2 | FreeBSD:13:aarch64 | t
 4380869 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  8 | FreeBSD:12:i386    | t
(37 rows)

A procedure to produce the above

I could not find a way to do the OUTER JOIN on any result set which consisted of more than one package_name.

Since this was written, I now have a single query solution.

I created this function which worked on one package_name at a time.

CREATE OR REPLACE FUNCTION PortPackagesNonPivot(a_port_id integer) RETURNS TABLE(abi text, abi_id integer, package_name text, package_set package_sets, package_version text)
    LANGUAGE plpgsql STABLE
    AS $$
  DECLARE
    l_rec  RECORD;
  BEGIN
    FOR l_rec IN SELECT *
                 FROM PackagesGetPackageNamesForPort(a_port_id) ORDER BY 1
    LOOP
      RETURN QUERY
        SELECT abi.name, abi.id, l_rec.package_name, P.package_set, P.package_version
          FROM abi LEFT OUTER JOIN 
             (SELECT * 
                FROM packages P 
               WHERE P.port_id = a_port_id
                 AND P.package_name = l_rec.package_name) AS P 
             ON abi.id = P.abi_id
         ORDER BY l_rec.package_name, abi.name;
    END LOOP;
END;
$$;


freshports.dev=# select * from PortPackagesNonPivot(28303);
        abi         | abi_id |     package_name     | package_set | package_version 
--------------------+--------+----------------------+-------------+-----------------
 FreeBSD:11:aarch64 |      7 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:aarch64 |      7 | py27-django-storages | latest      | 1.5.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | latest      | 1.5.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | latest      | 1.9.1
 FreeBSD:13:aarch64 |      2 | py27-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py27-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:aarch64 |      7 | py36-django-storages | latest      | 1.5.1
 FreeBSD:11:amd64   |      6 | py36-django-storages |             | 
 FreeBSD:11:i386    |      4 | py36-django-storages |             | 
 FreeBSD:12:aarch64 |      9 | py36-django-storages | latest      | 1.5.1
 FreeBSD:12:amd64   |      1 | py36-django-storages |             | 
 FreeBSD:12:i386    |      8 | py36-django-storages |             | 
 FreeBSD:13:aarch64 |      2 | py36-django-storages |             | 
 FreeBSD:13:amd64   |     11 | py36-django-storages |             | 
 FreeBSD:13:i386    |     10 | py36-django-storages |             | 
 FreeBSD:11:aarch64 |      7 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:13:aarch64 |      2 | py37-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py37-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py37-django-storages | latest      | 1.9.1
(37 rows)

Pivot the data

One goal is to put quarterly and latest results onto the same line.

 SELECT package_name, 
        abi_id, 
        max(package_version) filter(where package_set = 'latest')    AS package_version_latest,
        max(package_version) filter(where package_set = 'quarterly') AS package_version_quarterly
  FROM packages 
 WHERE port_id = 28303
 GROUP BY package_name, abi_id;
 
     package_name     | abi_id | package_version_latest | package_version_quarterly 
----------------------+--------+------------------------+---------------------------
 py27-django-storages |      1 | 1.9.1                  | 1.9.1
 py27-django-storages |      2 | 1.8                    | 
 py27-django-storages |      4 | 1.9.1                  | 1.9.1
 py27-django-storages |      6 | 1.9.1                  | 1.9.1
 py27-django-storages |      7 | 1.5.1                  | 1.9.1
 py27-django-storages |      8 | 1.9.1                  | 1.9.1
 py27-django-storages |      9 | 1.5.1                  | 1.9.1
 py27-django-storages |     10 | 1.9.1                  | 
 py27-django-storages |     11 | 1.9.1                  | 
 py36-django-storages |      7 | 1.5.1                  | 
 py36-django-storages |      9 | 1.5.1                  | 
 py37-django-storages |      1 | 1.9.1                  | 1.9.1
 py37-django-storages |      2 | 1.8                    | 
 py37-django-storages |      4 | 1.9.1                  | 1.9.1
 py37-django-storages |      6 | 1.9.1                  | 1.9.1
 py37-django-storages |      7 |                        | 1.9.1
 py37-django-storages |      8 | 1.9.1                  | 1.9.1
 py37-django-storages |      9 |                        | 1.9.1
 py37-django-storages |     10 | 1.9.1                  | 
 py37-django-storages |     11 | 1.9.1                  | 
(20 rows)

freshports.dev=# 

Combining the pivot with the data

Hwere I combine the previous two points into one function.

This incorporates additional data from packages_last_checked.

CREATE OR REPLACE FUNCTION PortPackages(a_port_id integer) 
RETURNS TABLE(package_name text, abi text, package_version_latest text, package_version_quarterly text,
     last_checked_latest    timestamp with time zone, repo_date_latest    timestamp with time zone, import_date_latest     timestamp with time zone,
     last_checked_quarterly timestamp with time zone, repo_date_quarterly timestamp with time zone, import_date_quarterly  timestamp with time zone)
  LANGUAGE SQL STABLE
  AS $$

    SELECT P.package_name,
           P.abi,
           max(package_version)  filter(where P.package_set   = 'latest')    AS package_version_latest,
           max(package_version)  filter(where P.package_set   = 'quarterly') AS package_version_quarterly,
           max(PLC.last_checked) filter(where PLC.package_set = 'latest')    AS last_checked_latest,
           max(PLC.repo_date)    filter(where PLC.package_set = 'latest')    AS repo_date_latest,
           max(PLC.import_date)  filter(where PLC.package_set = 'latest')    AS import_date_latest,
           max(PLC.last_checked) filter(where PLC.package_set = 'quarterly') AS last_checked_quarterly,
           max(PLC.repo_date)    filter(where PLC.package_set = 'quarterly') AS repo_date_quarterly,
           max(PLC.import_date)  filter(where PLC.package_set = 'quarterly') AS import_date_quarterly
      FROM PortPackagesNonPivot(a_port_id) P JOIN packages_last_checked PLC ON P.abi_id = PLC.abi_id
  GROUP BY P.package_name, P.abi
  ORDER BY P.package_name, P.abi;
$$;

freshports.dev=# SELECT * FROM PortPackages(28303);
     package_name     |        abi         | package_version_latest | package_version_quarterly |      last_checked_latest      |    repo_date_latest    |      import_date_latest       |    last_checked_quarterly     |  repo_date_quarterly   |     import_date_quarterly     |                sort_key                 
----------------------+--------------------+------------------------+---------------------------+-------------------------------+------------------------+-------------------------------+-------------------------------+------------------------+-------------------------------+-----------------------------------------
 py27-django-storages | FreeBSD:11:aarch64 | 1.5.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-09-29 04:24:00+00 | 2020-04-13 14:09:37.634897+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-07 12:03:00+00 | 2020-04-13 14:09:41.320966+00 | py27-django-storages-FreeBSD:11:aarch64
 py27-django-storages | FreeBSD:11:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 00:16:00+00 | 2020-04-13 14:09:46.602236+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:20:00+00 | 2020-04-13 14:09:51.877809+00 | py27-django-storages-FreeBSD:11:amd64
 py27-django-storages | FreeBSD:11:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 05:24:00+00 | 2020-04-13 14:09:56.774418+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 02:36:00+00 | 2020-04-13 14:10:01.378743+00 | py27-django-storages-FreeBSD:11:i386
 py27-django-storages | FreeBSD:12:aarch64 | 1.5.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-10-11 05:47:00+00 | 2020-04-13 14:10:06.230199+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 09:21:00+00 | 2020-04-13 14:10:10.340624+00 | py27-django-storages-FreeBSD:12:aarch64
 py27-django-storages | FreeBSD:12:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 01:18:00+00 | 2020-04-13 14:10:14.600613+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:56:00+00 | 2020-04-13 14:10:18.777864+00 | py27-django-storages-FreeBSD:12:amd64
 py27-django-storages | FreeBSD:12:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 06:22:00+00 | 2020-04-13 14:10:22.804925+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:36:00+00 | 2020-04-13 14:10:26.964927+00 | py27-django-storages-FreeBSD:12:i386
 py27-django-storages | FreeBSD:13:aarch64 | 1.8                    |                           | 2020-04-13 14:06:19.662548+00 | 2020-01-04 12:39:00+00 | 2020-04-13 14:10:31.011334+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py27-django-storages-FreeBSD:13:aarch64
 py27-django-storages | FreeBSD:13:amd64   | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 02:22:00+00 | 2020-04-13 14:10:35.255075+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py27-django-storages-FreeBSD:13:amd64
 py27-django-storages | FreeBSD:13:i386    | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-10 23:14:00+00 | 2020-04-13 14:10:39.416529+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py27-django-storages-FreeBSD:13:i386
 py36-django-storages | FreeBSD:11:aarch64 | 1.5.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2018-09-29 04:24:00+00 | 2020-04-13 14:09:37.634897+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-07 12:03:00+00 | 2020-04-13 14:09:41.320966+00 | py36-django-storages-FreeBSD:11:aarch64
 py36-django-storages | FreeBSD:11:amd64   |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 00:16:00+00 | 2020-04-13 14:09:46.602236+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:20:00+00 | 2020-04-13 14:09:51.877809+00 | py36-django-storages-FreeBSD:11:amd64
 py36-django-storages | FreeBSD:11:i386    |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-13 05:24:00+00 | 2020-04-13 14:09:56.774418+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 02:36:00+00 | 2020-04-13 14:10:01.378743+00 | py36-django-storages-FreeBSD:11:i386
 py36-django-storages | FreeBSD:12:aarch64 | 1.5.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2018-10-11 05:47:00+00 | 2020-04-13 14:10:06.230199+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 09:21:00+00 | 2020-04-13 14:10:10.340624+00 | py36-django-storages-FreeBSD:12:aarch64
 py36-django-storages | FreeBSD:12:amd64   |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 01:18:00+00 | 2020-04-13 14:10:14.600613+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:56:00+00 | 2020-04-13 14:10:18.777864+00 | py36-django-storages-FreeBSD:12:amd64
 py36-django-storages | FreeBSD:12:i386    |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-13 06:22:00+00 | 2020-04-13 14:10:22.804925+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:36:00+00 | 2020-04-13 14:10:26.964927+00 | py36-django-storages-FreeBSD:12:i386
 py36-django-storages | FreeBSD:13:aarch64 |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-01-04 12:39:00+00 | 2020-04-13 14:10:31.011334+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py36-django-storages-FreeBSD:13:aarch64
 py36-django-storages | FreeBSD:13:amd64   |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 02:22:00+00 | 2020-04-13 14:10:35.255075+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py36-django-storages-FreeBSD:13:amd64
 py36-django-storages | FreeBSD:13:i386    |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-10 23:14:00+00 | 2020-04-13 14:10:39.416529+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py36-django-storages-FreeBSD:13:i386
 py37-django-storages | FreeBSD:11:aarch64 |                        | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-09-29 04:24:00+00 | 2020-04-13 14:09:37.634897+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-07 12:03:00+00 | 2020-04-13 14:09:41.320966+00 | py37-django-storages-FreeBSD:11:aarch64
 py37-django-storages | FreeBSD:11:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 00:16:00+00 | 2020-04-13 14:09:46.602236+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:20:00+00 | 2020-04-13 14:09:51.877809+00 | py37-django-storages-FreeBSD:11:amd64
 py37-django-storages | FreeBSD:11:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 05:24:00+00 | 2020-04-13 14:09:56.774418+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 02:36:00+00 | 2020-04-13 14:10:01.378743+00 | py37-django-storages-FreeBSD:11:i386
 py37-django-storages | FreeBSD:12:aarch64 |                        | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-10-11 05:47:00+00 | 2020-04-13 14:10:06.230199+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 09:21:00+00 | 2020-04-13 14:10:10.340624+00 | py37-django-storages-FreeBSD:12:aarch64
 py37-django-storages | FreeBSD:12:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 01:18:00+00 | 2020-04-13 14:10:14.600613+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:56:00+00 | 2020-04-13 14:10:18.777864+00 | py37-django-storages-FreeBSD:12:amd64
 py37-django-storages | FreeBSD:12:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 06:22:00+00 | 2020-04-13 14:10:22.804925+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:36:00+00 | 2020-04-13 14:10:26.964927+00 | py37-django-storages-FreeBSD:12:i386
 py37-django-storages | FreeBSD:13:aarch64 | 1.8                    |                           | 2020-04-13 14:06:19.662548+00 | 2020-01-04 12:39:00+00 | 2020-04-13 14:10:31.011334+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py37-django-storages-FreeBSD:13:aarch64
 py37-django-storages | FreeBSD:13:amd64   | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 02:22:00+00 | 2020-04-13 14:10:35.255075+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py37-django-storages-FreeBSD:13:amd64
 py37-django-storages | FreeBSD:13:i386    | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-10 23:14:00+00 | 2020-04-13 14:10:39.416529+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py37-django-storages-FreeBSD:13:i386
(27 rows)

freshports.dev=# 
Apr 122020
 

Since the last past, I’ve added quarterly packages to the system.

I want to outline the changes while I remember the.

I will start by outline the tables we have.

abi

The abi table contains the usual suspects. Entries in this table are added manually, when a new release comes out.

See https://pkg.freebsd.org for examples of where these values are obtained from.

freshports.dev=# SELECT * FROM abi ORDER BY name;
 id |        name        | active |         last_checked          |         last_updated          
----+--------------------+--------+-------------------------------+-------------------------------
  7 | FreeBSD:11:aarch64 | t      | 2020-04-10 21:51:28.219155+00 | 2020-04-10 21:51:28.219155+00
  6 | FreeBSD:11:amd64   | t      | 2020-04-10 21:51:28.107421+00 | 2020-04-10 21:51:28.107421+00
  4 | FreeBSD:11:i386    | t      | 2020-04-10 21:05:01.431882+00 | 2020-04-10 21:05:01.431882+00
  9 | FreeBSD:12:aarch64 | t      | 2020-04-10 21:51:28.361931+00 | 2020-04-10 21:51:28.361931+00
  1 | FreeBSD:12:amd64   | t      | 2020-04-10 18:22:10.545416+00 | 2020-04-10 18:24:56.829453+00
  8 | FreeBSD:12:i386    | t      | 2020-04-10 21:51:28.278548+00 | 2020-04-10 21:51:28.278548+00
  2 | FreeBSD:13:aarch64 | t      | 2020-04-10 18:39:12.554194+00 | 2020-04-10 18:39:12.554194+00
 11 | FreeBSD:13:amd64   | t      | 2020-04-10 21:51:28.479757+00 | 2020-04-10 21:51:28.479757+00
 10 | FreeBSD:13:i386    | t      | 2020-04-10 21:51:28.421228+00 | 2020-04-10 21:51:28.421228+00
(9 rows)

freshports.dev=# \d abi
                                      Table "public.abi"
    Column    |           Type           | Collation | Nullable |           Default            
--------------+--------------------------+-----------+----------+------------------------------
 id           | integer                  |           | not null | generated always as identity
 name         | text                     |           | not null | 
 active       | boolean                  |           | not null | true
 last_checked | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 last_updated | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
Indexes:
    "abi_pkey" PRIMARY KEY, btree (id)
    "abi_name_idx" UNIQUE, btree (name)
Referenced by:
    TABLE "packages" CONSTRAINT "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    TABLE "packages_last_checked" CONSTRAINT "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)
    TABLE "packages_raw_original" CONSTRAINT "packages_raw_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID

freshports.dev=# 

The last_checked and last_updated columns are now deprecated. You will see their respective replacements in the packages_last_checked table.

packages

The packages contains the latest package version available for a given ABI on a given branch (e.g. latest, quarterly, 2020Q2).

freshports.dev=# SELECT * FROM packages ORDER BY package_name LIMIT 10;
   id   | abi_id | port_id | package_version | package_name | branch_id 
--------+--------+---------+-----------------+--------------+-----------
 186113 |      4 |   31186 | 0.0.23b_9       | 0ad          |         1
 204653 |      6 |   31186 | 0.0.23b_9       | 0ad          |         1
 278713 |      8 |   31186 | 0.0.23b_9       | 0ad          |         1
 371391 |     10 |   31186 | 0.0.23b_9       | 0ad          |         1
 403274 |     11 |   31186 | 0.0.23b_9       | 0ad          |         1
 495736 |      1 |   31186 | 0.0.23b_9       | 0ad          |         1
 590825 |      1 |   31186 | 0.0.23b_9       | 0ad          |       142
 622854 |      4 |   31186 | 0.0.23b_9       | 0ad          |       142
 653983 |      6 |   31186 | 0.0.23b_9       | 0ad          |       142
 712933 |      8 |   31186 | 0.0.23b_9       | 0ad          |       142
(10 rows)

freshports.dev=# \d packages
                             Table "public.packages"
     Column      |  Type   | Collation | Nullable |           Default            
-----------------+---------+-----------+----------+------------------------------
 id              | bigint  |           | not null | generated always as identity
 abi_id          | integer |           | not null | 
 port_id         | integer |           | not null | 
 package_version | text    |           | not null | 
 package_name    | text    |           | not null | 
 branch_id       | integer |           | not null | 
Indexes:
    "packages_pkey" PRIMARY KEY, btree (id)
    "fki_packages_abi_id_fk" btree (abi_id)
    "fki_packages_port_id_fk" btree (port_id)
    "packages_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    "packages_branch_id_fk" FOREIGN KEY (branch_id) REFERENCES system_branch(id) NOT VALID
    "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID

packages_raw

The packages_raw table is a plain text version of the packages table. The import script loads data into this table before it is loaded into the packages table.

The loading from the temporary table (packages_raw) ito the permanent table (packages) is easily done with a single database query. Doing the same work within an external program would be non-trivial, slower, and error-prone.

The conversion process

This query takes the raw data from the temporary table (packages_raw) into the permanent table (packages).

INSERT INTO packages(abi_id, branch_id, port_id, package_name, package_version)
SELECT GetABIId(abi),
       GetBranchId(branch),
       GetPort(package_origin),
       package_name,
       package_version                                                               
  FROM packages_raw
 WHERE GetPort(package_origin) IS NOT NULL
   AND GetBranchId(branch)     IS NOT NULL
   AND GetABIId(abi)           IS NOT NULL ;

At present, this query does all of the data and has no filtering (e.g. by ABI or branch).

packages_last_checked

The packages_last_checked table keeps track of work done or to be done.


freshports.dev=# SELECT * FROM packages_last_checked;
 abi_id | branch_id |         last_checked          |       repo_date        | import_date 
--------+-----------+-------------------------------+------------------------+-------------
      7 |         1 | 2020-04-12 16:56:25.824973+00 | 2018-09-29 04:24:00+00 | 
      7 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-07 12:03:00+00 | 
      6 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 00:16:00+00 | 
      6 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:20:00+00 | 
      4 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 18:43:00+00 | 
      4 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 02:36:00+00 | 
      9 |         1 | 2020-04-12 16:56:25.824973+00 | 2018-10-11 05:47:00+00 | 
      9 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-02 05:09:00+00 | 
      1 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 01:18:00+00 | 
      1 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:56:00+00 | 
      8 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 19:12:00+00 | 
      8 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:36:00+00 | 
      2 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-01-04 12:39:00+00 | 
      2 |       142 | 2020-04-12 16:56:25.824973+00 |                        | 
     11 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 02:22:00+00 | 
     11 |       142 | 2020-04-12 16:56:25.824973+00 |                        | 
     10 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 23:14:00+00 | 
     10 |       142 | 2020-04-12 16:56:25.824973+00 |                        | 
(18 rows)

freshports.dev=# \d packages_last_checked
                   Table "public.packages_last_checked"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 abi_id       | integer                  |           | not null | 
 branch_id    | integer                  |           | not null | 
 last_checked | timestamp with time zone |           |          | 
 repo_date    | timestamp with time zone |           |          | 
 import_date  | timestamp with time zone |           |          | 
Indexes:
    "packages_last_checked_pkey" PRIMARY KEY, btree (abi_id, branch_id)
    "packages_last_checked_branch_id_idx" btree (branch_id)
Foreign-key constraints:
    "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)

freshports.dev=# 

When the repo_date column is empty, no repo was found for this ABI & branch combination.

Note the import_date column is empty. The script which populates the packages table does not yet update that value.

The check_repos_for_new_stuff.py script updates the last_checked and repo_date columns.

To find repos which need to be updated, use this query:

SELECT *
  FROM packages_last_checked 
 WHERE repo_date > import_date 
   OR (repo_date IS NOT NULL AND import_date IS NULL);

What’s next?

The import-raw-abi.sh script is the basis of future work. The pseudo code is similar to this:

SELECT ABI, branch values which need updating
FOR EACH combination
DO
   fetch https://pkg.freebsd.org/$abi/$branch/packagesite.txz
    unxz packagesite.txz
    tar -xf packagesite.tar

    ls -l ./packagesite.yaml
    jq -rc --arg ABI "$abi" --arg BRANCH "$branch" '[$ABI, $BRANCH, .origin, .name, .version] | @tsv' < ./packagesite.yaml > packagesite.csv
  
    ~/bin/import-via-copy-packagesite-all-raw-fields.py -i packagesite.csv

   Update the packages_last_checked.import_date for this combination
DONE

NOTES on this:

  • The launching of this script can be trigger based. When an update occurs to the packages_last_checked, observe if new data is detected, and set work-to-be-done accordingly.
  • I now think that import and conversion from packages_raw to packages can be done separately.
  • Add another column to the packages_last_checked, call it import_processed.
  • When you find a column with import_date > import_processed, that indicates you need to run the conversion process (see a previous section).