Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Apr 172020
 

In the previous post I documented how the packages_raw table is updated to populate the abi_id and port_id columns. This post will discuss how that data is then pushed into the packages table.

Thoughts

When updates arrived, they arrive in quantity. The latest update was 30474 records. From that, we will have some new package, some will go away, and some will be updates. Many will have no change. I see two approaches:

  1. UPDATE where found and nothing to change, INSERT where not found, DELETE where no longer present.
  2. DELETE everything related to this import, INSERT everything imported.

Option 1 will have far less database churn. Option 2 will maximize it.

Let’s look at option 1 mostly because I like that approach best and it should be straight forward to implement.

That approach also makes me want to keep statistics on package building changes.

UPDATE

I want to update only records which have changes. This works:

SELECT *
  FROM packages_raw PR, packages P
 WHERE P.abi_id           = PR.abi_id
   AND P.package_set      = PR.package_set
   AND P.package_name     = PR.package_name
   AND P.package_version != PR.package_version
   AND PR.abi             = 'FreeBSD:13:amd64'
   AND PR.package_set     = 'latest';

Therefore I only want to update those rows. Let’s try this query:

UPDATE packages p
   SET package_version = PR.package_version
  FROM packages_raw PR
 WHERE P.abi_id           = PR.abi_id
   AND P.package_set      = PR.package_set
   AND P.package_name     = PR.package_name
   AND P.package_version != PR.package_version;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages p  (cost=168509.43..183856.43 rows=392902 width=70) (actual time=11500.885..11500.885 rows=0 loops=1)
   ->  Merge Join  (cost=168509.43..183856.43 rows=392902 width=70) (actual time=7492.742..10230.503 rows=2562 loops=1)
         Merge Cond: ((p.abi_id = pr.abi_id) AND (p.package_set = pr.package_set) AND (p.package_name = pr.package_name))
         Join Filter: (p.package_version <> pr.package_version)
         Rows Removed by Join Filter: 455631
         ->  Sort  (cost=82431.21..83603.67 rows=468984 width=48) (actual time=2378.262..3650.466 rows=459029 loops=1)
               Sort Key: p.abi_id, p.package_set, p.package_name
               Sort Method: external merge  Disk: 26824kB
               ->  Seq Scan on packages p  (cost=0.00..9400.84 rows=468984 width=48) (actual time=0.277..1327.490 rows=459029 loops=1)
         ->  Materialize  (cost=86078.22..88376.89 rows=459733 width=36) (actual time=5114.403..6312.587 rows=459496 loops=1)
               ->  Sort  (cost=86078.22..87227.55 rows=459733 width=36) (actual time=5114.397..6247.665 rows=459496 loops=1)
                     Sort Key: pr.abi_id, pr.package_set, pr.package_name
                     Sort Method: external merge  Disk: 21408kB
                     ->  Seq Scan on packages_raw pr  (cost=0.00..17695.33 rows=459733 width=36) (actual time=0.283..3710.428 rows=459733 loops=1)
 Planning Time: 4.093 ms
 Execution Time: 11501.204 ms
(16 rows)

freshports.dev=# 

By this time it was late and I finished this for the day. Good thing I did…

Let ZFS ARC cache all the database

The next morning, I started looking at shared_buffers and random_page_cost.

I was worried about this update taking 10 seconds, but it does a join across two 450,000 row tables to then update 2500 rows. See https://explain.depesz.com/s/xlWa. I knew I had set this before, so I went searching for values. I checked my production server, where I found it shared_buffers = 128MB. Eventually I found my blog post about caching in ZFS vs caching in PostgreSQL: Website slowness – the investigation.

After setting zfs set primarycache=all, I did the steps found in that post to pre-cache the database:

[dan@x8dtu:~] $ sudo zfs snapshot zroot/data/pg01-postgres@for.caching
[dan@x8dtu:~] $ sudo zfs send zroot/data/pg01-postgres@for.caching > /dev/null
[dan@x8dtu:~] $ sudo zfs destroy zroot/data/pg01-postgres@for.caching
[dan@x8dtu:~] $ 

This server also runs off SSD for which random_page_cost=1.1 is appropriate.

Don’t process the whole table

For all of these operations we must be cautious regarding the contents of packages_raw.
We can never assume that packages_raw contains entries for all ABI and package_set combination already present in the packages table. Rather, it will contain only those sets which were recently imported. If the check and import implemented with sufficient frequency, it will detect new repos immediately and packages_raw will contain data from just one repo.

For this testing, we are using ABI FreeBSD:13:amd64 and package set latest.

The UPDATE: testing with a smaller set and more ZFS caching

Now the results have gone from about 7 or 11 seconds to about 1.5 seconds:

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS) 
UPDATE packages p
   SET package_version = PR.package_version
  FROM packages_raw PR
 WHERE P.abi_id           = PR.abi_id
   AND P.package_set      = PR.package_set
   AND P.package_name     = PR.package_name
   AND P.package_version != PR.package_version
   AND PR.abi             = 'FreeBSD:13:amd64'
   AND PR.package_set     = 'latest';
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages p  (cost=15321.52..26841.38 rows=17989 width=69) (actual time=305.336..305.336 rows=0 loops=1)
   Buffers: shared hit=9313 read=7201 dirtied=1105 written=10
   I/O Timings: read=72.845
   ->  Hash Join  (cost=15321.52..26841.38 rows=17989 width=69) (actual time=234.095..264.130 rows=469 loops=1)
         Hash Cond: ((pr.abi_id = p.abi_id) AND (pr.package_name = p.package_name))
         Join Filter: (p.package_version <> pr.package_version)
         Rows Removed by Join Filter: 30880
         Buffers: shared hit=154 read=6374 dirtied=135
         I/O Timings: read=48.906
         ->  Bitmap Heap Scan on packages_raw pr  (cost=480.97..11900.62 rows=19088 width=36) (actual time=17.957..34.032 rows=31401 loops=1)
               Recheck Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
               Heap Blocks: exact=530
               Buffers: shared read=1668
               I/O Timings: read=17.758
               ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..476.20 rows=19088 width=0) (actual time=17.832..17.832 rows=62784 loops=1)
                     Index Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
                     Buffers: shared read=773
                     I/O Timings: read=10.732
         ->  Hash  (cost=10655.10..10655.10 rows=279030 width=47) (actual time=210.372..210.373 rows=275766 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 26085kB
               Buffers: shared hit=154 read=4706 dirtied=135
               I/O Timings: read=31.148
               ->  Seq Scan on packages p  (cost=0.00..10655.10 rows=279030 width=47) (actual time=0.023..113.982 rows=275766 loops=1)
                     Filter: (package_set = 'latest'::package_sets)
                     Rows Removed by Filter: 183263
                     Buffers: shared hit=154 read=4706 dirtied=135
                     I/O Timings: read=31.148
 Planning Time: 0.583 ms
 Execution Time: 308.639 ms
(29 rows)

freshports.dev=# 

See https://explain.depesz.com/s/N6KT for details.

The INSERT

We have to rows in packages_raw which are not found in packages.

How do we find that data?

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS) 
SELECT *
  FROM packages_raw PR
 WHERE NOT EXISTS (
 SELECT *
   FROM packages P
  WHERE P.abi_id           = PR.abi_id
    AND P.package_set      = PR.package_set
    AND P.package_name     = PR.package_name
    AND P.package_version  = PR.package_version
    AND PR.abi             = 'FreeBSD:13:amd64'
    AND PR.package_set     = 'latest')
 AND PR.abi_id  IS NOT NULL
 AND PR.port_id IS NOT NULL
 AND PR.abi             = 'FreeBSD:13:amd64'
 AND PR.package_set     = 'latest';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1480.76..21125.96 rows=15908 width=79) (actual time=21.782..137.166 rows=503 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=127168
   ->  Nested Loop Anti Join  (cost=480.76..18535.16 rows=6628 width=79) (actual time=13.563..122.878 rows=168 loops=3)
         Join Filter: ((pr.abi = 'FreeBSD:13:amd64'::text) AND (pr.package_set = 'latest'::package_sets))
         Buffers: shared hit=127168
         ->  Parallel Bitmap Heap Scan on packages_raw pr  (cost=480.34..11732.96 rows=6892 width=79) (actual time=9.732..21.140 rows=10461 loops=3)
               Recheck Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
               Filter: ((abi_id IS NOT NULL) AND (port_id IS NOT NULL))
               Rows Removed by Filter: 6
               Heap Blocks: exact=201
               Buffers: shared hit=1668
               ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..476.20 rows=19088 width=0) (actual time=16.749..16.749 rows=62784 loops=1)
                     Index Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
                     Buffers: shared hit=773
         ->  Index Only Scan using packages_all_idx on packages p  (cost=0.42..0.97 rows=1 width=29) (actual time=0.008..0.008 rows=1 loops=31383)
               Index Cond: ((abi_id = pr.abi_id) AND (package_name = pr.package_name) AND (package_set = pr.package_set) AND (package_set = 'latest'::package_sets))
               Filter: (package_version = pr.package_version)
               Rows Removed by Filter: 0
               Heap Fetches: 31349
               Buffers: shared hit=125500
 Planning Time: 1.609 ms
 Execution Time: 137.356 ms
(24 rows)

freshports.dev=# 

The two IS NOT NULL clauses avoids inserting package data for which we did not find a matching port. This is not uncommon on the development server, but will be flagged on the production host.

The INSERT then becomes:

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS) 
INSERT INTO packages (abi_id, port_id, package_version, package_name, package_set)
SELECT abi_id, port_id, package_version, package_name, package_set
  FROM packages_raw PR
 WHERE NOT EXISTS (
 SELECT *
   FROM packages P
  WHERE P.abi_id           = PR.abi_id
    AND P.package_set      = PR.package_set
    AND P.package_name     = PR.package_name
    AND P.package_version  = PR.package_version
    AND PR.abi             = 'FreeBSD:13:amd64'
    AND PR.package_set     = 'latest')
 AND PR.abi_id  IS NOT NULL
 AND PR.port_id IS NOT NULL
 AND PR.abi             = 'FreeBSD:13:amd64'
 AND PR.package_set     = 'latest';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on packages  (cost=480.76..28278.42 rows=15908 width=58) (actual time=341.049..341.050 rows=0 loops=1)
   Buffers: shared hit=133908 read=1 dirtied=828 written=9
   I/O Timings: read=0.048
   ->  Nested Loop Anti Join  (cost=480.76..28278.42 rows=15908 width=58) (actual time=21.860..313.677 rows=503 loops=1)
         Join Filter: ((pr.abi = 'FreeBSD:13:amd64'::text) AND (pr.package_set = 'latest'::package_sets))
         Buffers: shared hit=127747 dirtied=24
         ->  Bitmap Heap Scan on packages_raw pr  (cost=480.34..11899.98 rows=16541 width=51) (actual time=16.643..48.263 rows=31383 loops=1)
               Recheck Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
               Filter: ((abi_id IS NOT NULL) AND (port_id IS NOT NULL))
               Rows Removed by Filter: 18
               Heap Blocks: exact=530
               Buffers: shared hit=1668
               ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..476.20 rows=19088 width=0) (actual time=16.354..16.354 rows=62784 loops=1)
                     Index Cond: ((abi = 'FreeBSD:13:amd64'::text) AND (package_set = 'latest'::package_sets))
                     Buffers: shared hit=773
         ->  Index Only Scan using packages_all_idx on packages p  (cost=0.42..0.97 rows=1 width=29) (actual time=0.007..0.007 rows=1 loops=31383)
               Index Cond: ((abi_id = pr.abi_id) AND (package_name = pr.package_name) AND (package_set = pr.package_set) AND (package_set = 'latest'::package_sets))
               Filter: (package_version = pr.package_version)
               Rows Removed by Filter: 0
               Heap Fetches: 31450
               Buffers: shared hit=125576 dirtied=24
 Planning Time: 1.607 ms
 Execution Time: 341.237 ms
(23 rows)

freshports.dev=# 

See https://explain.depesz.com/s/ylCc.

The DELETE

We must be cautious with this query. The goal is to remove entries from the packages table which are not available from the repo.

We want to find entries in the packages table which are not found in the package_raw table, the inverse of what we do for INSERT.

Originally, I thought package_version was required in this SELECT, but that will pick up entries which exist, but only the package version has changed. If we always did the DELETE after the UPDATE, the inclusion of package_version in the comparison would still work, but it is better to not include it.

begin;
set track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM packages P
 WHERE P.abi_id      = 11
   AND P.package_set = 'latest'
   AND NOT EXISTS (
     SELECT *
       FROM packages_raw PR
      WHERE P.abi_id       = PR.abi_id
        AND P.package_set  = PR.package_set
        AND P.package_name = PR.package_name
        AND PR.abi         = 'FreeBSD:13:amd64'
        AND PR.package_set = 'latest');
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on packages p  (cost=3.08..6108.16 rows=19353 width=12) (actual time=52.662..52.662 rows=0 loops=1)
   Buffers: shared hit=35880
   ->  Merge Anti Join  (cost=3.08..6108.16 rows=19353 width=12) (actual time=23.933..52.586 rows=36 loops=1)
         Merge Cond: (p.package_name = pr.package_name)
         Join Filter: ((p.abi_id = pr.abi_id) AND (p.package_set = pr.package_set))
         Buffers: shared hit=35844
         ->  Index Scan using packages_all_idx on packages p  (cost=0.42..6057.09 rows=19354 width=28) (actual time=0.018..15.736 rows=31384 loops=1)
               Index Cond: ((abi_id = 11) AND (package_set = 'latest'::package_sets))
               Buffers: shared hit=27257
         ->  Sort  (cost=2.66..2.66 rows=1 width=29) (actual time=23.518..25.581 rows=31383 loops=1)
               Sort Key: pr.package_name
               Sort Method: quicksort  Memory: 3278kB
               Buffers: shared hit=8587
               ->  Index Scan using packages_raw_all on packages_raw pr  (cost=0.42..2.65 rows=1 width=29) (actual time=0.021..14.889 rows=31383 loops=1)
                     Index Cond: ((abi_id = 11) AND (package_set = 'latest'::package_sets))
                     Filter: (abi = 'FreeBSD:13:amd64'::text)
                     Buffers: shared hit=8587
 Planning Time: 0.421 ms
 Execution Time: 52.712 ms
(19 rows)

See https://explain.depesz.com/s/ysWG for details.

Putting it all into one function

This is the function, complete with stats updates.

CREATE OR REPLACE FUNCTION UpdatePackagesFromRawPackages( IN a_abi_name text, IN a_package_set package_sets)
    RETURNS TABLE(inserts integer, updates integer, deletes integer)
    LANGUAGE plpgsql VOLATILE
    AS $$
  DECLARE
    l_abi_id  integer;
    l_inserts integer;
    l_deletes integer;
    l_updates integer;
  BEGIN
    SELECT id
      INTO l_abi_id
      FROM abi
     WHERE name = a_abi_name;

    IF NOT FOUND THEN
        RAISE EXCEPTION 'cannot find id for ABI %', a_abi_name;
    END IF;

    UPDATE packages p
       SET package_version = PR.package_version
      FROM packages_raw PR
     WHERE P.abi_id           = PR.abi_id
       AND P.package_set      = PR.package_set
       AND P.package_name     = PR.package_name
       AND P.package_version != PR.package_version
       AND PR.abi             = a_abi_name
       AND PR.package_set     = a_package_set;

    GET DIAGNOSTICS l_updates = ROW_COUNT;

    INSERT INTO packages (abi_id, port_id, package_version, package_name, package_set)
    SELECT abi_id, port_id, package_version, package_name, package_set
      FROM packages_raw PR
     WHERE NOT EXISTS (
     SELECT *
       FROM packages P
      WHERE P.abi_id           = PR.abi_id
        AND P.package_set      = PR.package_set
        AND P.package_name     = PR.package_name
        AND P.package_version  = PR.package_version
        AND PR.abi             = a_abi_name
        AND PR.package_set     = a_package_set)
     AND PR.abi_id  IS NOT NULL
     AND PR.port_id IS NOT NULL
     AND PR.abi             = a_abi_name
     AND PR.package_set     = a_package_set;

    GET DIAGNOSTICS l_inserts = ROW_COUNT;

    DELETE FROM packages P
     WHERE P.abi_id      = l_abi_id
       AND P.package_set = a_package_set
       AND NOT EXISTS (
         SELECT *
           FROM packages_raw PR
          WHERE P.abi_id       = PR.abi_id
            AND P.package_set  = PR.package_set
            AND P.package_name = PR.package_name
            AND PR.abi         = a_abi_name
            AND PR.package_set = a_package_set);

    GET DIAGNOSTICS l_deletes = ROW_COUNT;

    INSERT INTO package_imports (abi_id, package_set, date, inserts, updates, deletes)
                         values (l_abi_id, a_package_set, CURRENT_TIMESTAMP, l_inserts, l_updates, l_deletes);

    RETURN QUERY (SELECT l_inserts, l_updates, l_deletes);

  END;
$$;


And it seems to work as expected. More testing to follow.

freshports.dev=# begin; select * from UpdatePackagesFromRawPackages('FreeBSD:13:amd64', 'latest');
BEGIN
 inserts | updates | deletes 
---------+---------+---------
      35 |     469 |      36
(1 row)

freshports.dev=# select * from package_imports;
 id | abi_id | package_set |        date        | inserts | updates | deletes 
----+--------+-------------+--------------------+---------+---------+---------
  1 |     11 | latest      | 20:43:46.194598+00 |      35 |     469 |      36
(1 row)

freshports.dev=# rollback;

That is a total of 540 records updates, out of 30,474 records imported. That’s far better than deleting and inserting all that data.

More work on this over the weekend.

Apr 162020
 

This post is another in the packages series. It documents how the data is transformed from the raw form (pacakges_raw) into normalized data (packages).

The Packages – how is this data stored? post may be useful reading.

The primary purpose of this blog post is documentation of the above mentioned transformation process. Along the way I found a performance enhancement owing to a deferred constraint. That’s why this post is tagged with PostgreSQL.

Some data background

When data is copied into the package_raw table, two columns are left empty:

  1. abi_id
  2. port_id

Those values can be determined based on columns which are supplied:

  1. abi
  2. package_origin

Here is some sample data:

freshports.dev=# SELECT * FROM packages_raw WHERE abi = 'FreeBSD:13:i386' AND package_set = 'latest' ORDER BY package_origin LIMIT 10;
   id    |         package_origin         |   package_name   | package_version |       abi       | abi_id | port_id | package_set 
---------+--------------------------------+------------------+-----------------+-----------------+--------+---------+-------------
 2749964 | accessibility/accerciser       | accerciser       | 3.22.0          | FreeBSD:13:i386 |        |         | latest
 2749124 | accessibility/at-spi2-atk      | at-spi2-atk      | 2.26.2          | FreeBSD:13:i386 |        |         | latest
 2749119 | accessibility/at-spi2-core     | at-spi2-core     | 2.28.0          | FreeBSD:13:i386 |        |         | latest
 2749003 | accessibility/atk              | atk              | 2.28.1          | FreeBSD:13:i386 |        |         | latest
 2748993 | accessibility/atkmm            | atkmm            | 2.24.2_4        | FreeBSD:13:i386 |        |         | latest
 2747796 | accessibility/caribou          | caribou          | 0.4.21_2        | FreeBSD:13:i386 |        |         | latest
 2760503 | accessibility/eflite           | eflite           | 0.4.1           | FreeBSD:13:i386 |        |         | latest
 2744976 | accessibility/kdeaccessibility | kdeaccessibility | 19.12.3         | FreeBSD:13:i386 |        |         | latest
 2756892 | accessibility/kmag             | kmag             | 19.12.3         | FreeBSD:13:i386 |        |         | latest
 2756793 | accessibility/kmousetool       | kmousetool       | 19.12.3_1       | FreeBSD:13:i386 |        |         | latest
(10 rows)

freshports.dev=# 

This date will eventually be copied into the packages table, but we need to populate the two empty columns first.

What should the data look like?

My first goal was to create a result set resembled the desired outcome.

        SELECT P.id,
               abi.name,
               abi.id AS abi_id,
               P.package_origin,
               PO.port_id,
               P.package_name,
               P.package_version,
               P.package_set
          FROM packages_raw P 
          JOIN abi on P.abi = abi.name
          JOIN ports_origin PO on PO.port_origin = P.package_origin
         WHERE P.abi         = 'FreeBSD:13:i386'
           AND P.package_set = 'latest'
      ORDER BY package_origin 
         LIMIT 10;
   id    |      name       | abi_id |         package_origin         | port_id |   package_name   | package_version | package_set 
---------+-----------------+--------+--------------------------------+---------+------------------+-----------------+-------------
 2749964 | FreeBSD:13:i386 |     10 | accessibility/accerciser       |   21760 | accerciser       | 3.22.0          | latest
 2749124 | FreeBSD:13:i386 |     10 | accessibility/at-spi2-atk      |   26871 | at-spi2-atk      | 2.26.2          | latest
 2749119 | FreeBSD:13:i386 |     10 | accessibility/at-spi2-core     |   26868 | at-spi2-core     | 2.28.0          | latest
 2749003 | FreeBSD:13:i386 |     10 | accessibility/atk              |   11601 | atk              | 2.28.1          | latest
 2748993 | FreeBSD:13:i386 |     10 | accessibility/atkmm            |   27818 | atkmm            | 2.24.2_4        | latest
 2747796 | FreeBSD:13:i386 |     10 | accessibility/caribou          |   35704 | caribou          | 0.4.21_2        | latest
 2760503 | FreeBSD:13:i386 |     10 | accessibility/eflite           |   22197 | eflite           | 0.4.1           | latest
 2744976 | FreeBSD:13:i386 |     10 | accessibility/kdeaccessibility |   11726 | kdeaccessibility | 19.12.3         | latest
 2756892 | FreeBSD:13:i386 |     10 | accessibility/kmag             |   31155 | kmag             | 19.12.3         | latest
 2756793 | FreeBSD:13:i386 |     10 | accessibility/kmousetool       |   31156 | kmousetool       | 19.12.3_1       | latest
(10 rows)

freshports.dev=# 

The ports_orgin is a new table created specifically for this query. Until now, there has been no need to obtain a port_id based upon its origin. This table will be updated via triggers upon the ports table. That bit of work has been deferred for now.

freshports.dev=# \d ports_origin
              Table "public.ports_origin"
   Column    |  Type   | Collation | Nullable | Default 
-------------+---------+-----------+----------+---------
 port_id     | integer |           | not null | 
 port_origin | text    |           | not null | 
Indexes:
    "port_origin_pkey" PRIMARY KEY, btree (port_id)
    "ports_origin_port_id_idx" btree (port_id)
    "ports_origin_port_origin_idx" btree (port_origin)
Foreign-key constraints:
    "ports_origin_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE NOT VALID

The original test

The original test took 18 seconds to populate about 30,000 rows:

explain analyse
UPDATE packages_raw PR
   SET abi_id  = pkg.abi_id,
       port_id = pkg.port_id
  FROM (SELECT P.id,
               abi.name,
               abi.id AS abi_id,
               P.package_origin,
               PO.port_id,
               P.package_name,
               P.package_version,
               P.package_set
          FROM packages_raw P 
          JOIN abi on P.abi = abi.name
          JOIN ports_origin PO on PO.port_origin = P.package_origin
         WHERE P.abi        = 'FreeBSD:13:i386'
           AND P.package_set = 'latest') AS pkg
 WHERE PR.id = pkg.id;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_raw pr  (cost=11144.47..40827.87 rows=30175 width=115) (actual time=11081.844..11081.844 rows=0 loops=1)
   ->  Hash Join  (cost=11144.47..40827.87 rows=30175 width=115) (actual time=2880.504..4680.566 rows=38905 loops=1)
         Hash Cond: (p.package_origin = po.port_origin)
         ->  Nested Loop  (cost=9051.81..36627.53 rows=22050 width=113) (actual time=2659.534..3858.975 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.344..0.351 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Hash Join  (cost=9051.81..36405.99 rows=22050 width=120) (actual time=2659.181..3844.350 rows=30492 loops=1)
                     Hash Cond: (pr.id = p.id)
                     ->  Seq Scan on packages_raw pr  (cost=0.00..12452.32 rows=524132 width=77) (actual time=0.370..2365.531 rows=459733 loops=1)
                     ->  Hash  (cost=8560.18..8560.18 rows=22050 width=51) (actual time=143.257..143.258 rows=30492 loops=1)
                           Buckets: 16384  Batches: 4  Memory Usage: 767kB
                           ->  Bitmap Heap Scan on packages_raw p  (cost=1018.44..8560.18 rows=22050 width=51) (actual time=86.264..117.885 rows=30492 loops=1)
                                 Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                                 Heap Blocks: exact=416
                                 ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..1012.92 rows=22050 width=0) (actual time=85.918..85.918 rows=66670 loops=1)
                                       Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=220.029..220.029 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.458..172.482 rows=57807 loops=1)
 Planning Time: 7.125 ms
 Trigger for constraint packages_raw_abi_id_fk: time=1151.252 calls=30474
 Trigger for constraint packages_raw_port_id_fk: time=6532.501 calls=30474
 Execution Time: 18782.602 ms

See https://explain.depesz.com/s/6d9Z for more detail.

I noticed all the time spent checking foreign keys while updating 30474 rows. I figured I could save perhaps most of that nearly 7 seconds spent on constraint.

I thought we could defer that. I changed the packages_raw_port_id_fk constraint to be DEFERRABLE INITIALLY DEFERRED. I also added ON UPDATE CASCADE and ON DELETE CASCADE which should have been there from the start (but that’s not relevant to the change in performance).

Test number 2

On the second try, we get:

Update on packages_raw pr  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=12221.244..12221.244 rows=0 loops=1)
   ->  Hash Join  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=2772.498..6044.554 rows=38905 loops=1)
         Hash Cond: (p.package_origin = po.port_origin)
         ->  Nested Loop  (cost=10845.18..34762.92 rows=14998 width=113) (actual time=2552.187..5468.500 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.335..0.338 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Hash Join  (cost=10845.18..34611.91 rows=14998 width=120) (actual time=2551.846..5455.815 rows=30492 loops=1)
                     Hash Cond: (pr.id = p.id)
                     ->  Seq Scan on packages_raw pr  (cost=0.00..12835.28 rows=384928 width=77) (actual time=0.337..2901.118 rows=459733 loops=1)
                     ->  Hash  (cost=10510.70..10510.70 rows=14998 width=51) (actual time=171.243..171.244 rows=30492 loops=1)
                           Buckets: 16384 (originally 16384)  Batches: 4 (originally 2)  Memory Usage: 897kB
                           ->  Bitmap Heap Scan on packages_raw p  (cost=978.15..10510.70 rows=14998 width=51) (actual time=100.492..143.901 rows=30492 loops=1)
                                 Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                                 Heap Blocks: exact=416
                                 ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..974.40 rows=14998 width=0) (actual time=100.172..100.173 rows=60966 loops=1)
                                       Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=219.460..219.460 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.459..172.605 rows=57807 loops=1)
 Planning Time: 9.476 ms
 Trigger for constraint packages_raw_abi_id_fk: time=363.506 calls=30474
 Execution Time: 12589.008 ms

Roughly 5 seconds of savings. See https://explain.depesz.com/s/Y9gj.

OK, let’s do the same for abi_id.

3rd time

After setting packages_raw_abi_id_fk to DEFERRABLE INITIALLY DEFERRED, the 3rd test results were:

Update on packages_raw pr  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=8798.389..8798.389 rows=0 loops=1)
   ->  Hash Join  (cost=12937.84..38416.01 rows=20524 width=115) (actual time=2841.848..4960.062 rows=38905 loops=1)
         Hash Cond: (p.package_origin = po.port_origin)
         ->  Nested Loop  (cost=10845.18..34762.92 rows=14998 width=113) (actual time=2620.392..3970.915 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.343..0.350 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Hash Join  (cost=10845.18..34611.91 rows=14998 width=120) (actual time=2620.040..3956.973 rows=30492 loops=1)
                     Hash Cond: (pr.id = p.id)
                     ->  Seq Scan on packages_raw pr  (cost=0.00..12835.28 rows=384928 width=77) (actual time=0.340..2796.802 rows=459733 loops=1)
                     ->  Hash  (cost=10510.70..10510.70 rows=14998 width=51) (actual time=167.195..167.196 rows=30492 loops=1)
                           Buckets: 16384 (originally 16384)  Batches: 4 (originally 2)  Memory Usage: 897kB
                           ->  Bitmap Heap Scan on packages_raw p  (cost=978.15..10510.70 rows=14998 width=51) (actual time=110.897..140.914 rows=30492 loops=1)
                                 Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                                 Heap Blocks: exact=416
                                 ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..974.40 rows=14998 width=0) (actual time=110.419..110.420 rows=91440 loops=1)
                                       Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=220.662..220.662 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.446..173.184 rows=57807 loops=1)
 Planning Time: 4.560 ms
 Execution Time: 8800.805 ms

That’s nearly 10 seconds faster. Sure, there might be some caching involved but the time spent checking foreign keys is greatly reduced.

See https://explain.depesz.com/s/XXnj.

Next, I noticed the sequential scan on ports_origin. Let’s see about that.

Yeah, that self-join

The self-join was noticed by xocolatl as being unnecessary. Let’s remove that to get:

UPDATE packages_raw PR
   SET abi_id  = abi.id,
       port_id = PO.port_id
  FROM abi, ports_origin PO
 WHERE PR.abi         = 'FreeBSD:13:i386'
   AND PR.package_set = 'latest'
   AND PR.abi         = abi.name
   AND PO.port_origin = PR.package_origin;

                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_raw pr  (cost=3081.65..12658.10 rows=21940 width=109) (actual time=6183.437..6183.437 rows=0 loops=1)
   ->  Hash Join  (cost=3081.65..12658.10 rows=21940 width=109) (actual time=418.091..845.851 rows=38905 loops=1)
         Hash Cond: (pr.package_origin = po.port_origin)
         ->  Nested Loop  (cost=989.00..9048.43 rows=16056 width=87) (actual time=174.079..268.378 rows=30492 loops=1)
               ->  Seq Scan on abi  (cost=0.00..1.04 rows=1 width=42) (actual time=0.030..0.036 rows=1 loops=1)
                     Filter: (name = 'FreeBSD:13:i386'::text)
                     Rows Removed by Filter: 8
               ->  Bitmap Heap Scan on packages_raw pr  (cost=989.00..8886.84 rows=16056 width=77) (actual time=174.032..256.085 rows=30492 loops=1)
                     Recheck Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
                     Heap Blocks: exact=416
                     ->  Bitmap Index Scan on packages_raw_abi_package_set_idx  (cost=0.00..984.98 rows=16056 width=0) (actual time=173.458..173.458 rows=121903 loops=1)
                           Index Cond: ((abi = 'FreeBSD:13:i386'::text) AND (package_set = 'latest'::package_sets))
         ->  Hash  (cost=974.07..974.07 rows=57807 width=30) (actual time=243.062..243.062 rows=57807 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 576kB
               ->  Seq Scan on ports_origin po  (cost=0.00..974.07 rows=57807 width=30) (actual time=0.451..195.454 rows=57807 loops=1)
 Planning Time: 2.922 ms
 Execution Time: 6186.449 ms
(17 rows)

Well, isn’t that better?

See https://explain.depesz.com/s/cnyw for details.

That sequential scan on ports_origin takes up 195ms. Nothing to worry about.

Things to follow up on

This query doesn’t find values for all the FreeBSD:13:i386 * latest combination. Some are not matched.

freshports.dev=# SELECT * FROM packages_raw WHERE port_id IS NULL AND abi = 'FreeBSD:13:i386' AND package_set = 'latest' ORDER BY package_origin;
   id    |            package_origin            |          package_name          | package_version |       abi       | abi_id | port_id | package_set 
---------+--------------------------------------+--------------------------------+-----------------+-----------------+--------+---------+-------------
 2745970 | databases/rubygem-peek-mysql2-rails5 | rubygem-peek-mysql2-rails5     | 1.2.0           | FreeBSD:13:i386 |        |         | latest
 2740242 | devel/p5-File-ShareDir-Tarball       | p5-File-ShareDir-Tarball       | 0.2.2           | FreeBSD:13:i386 |        |         | latest
 2741383 | devel/p5-MooX-Singleton              | p5-MooX-Singleton              | 1.20            | FreeBSD:13:i386 |        |         | latest
 2740542 | devel/p5-MooseX-MungeHas             | p5-MooseX-MungeHas             | 0.011           | FreeBSD:13:i386 |        |         | latest
 2733133 | devel/py-aspy.yaml                   | py27-aspy.yaml                 | 1.3.0           | FreeBSD:13:i386 |        |         | latest
 2741040 | devel/py-aspy.yaml                   | py37-aspy.yaml                 | 1.3.0           | FreeBSD:13:i386 |        |         | latest
 2743867 | devel/py-nodeenv                     | py27-nodeenv                   | 1.3.3           | FreeBSD:13:i386 |        |         | latest
 2744182 | devel/py-nodeenv                     | py37-nodeenv                   | 1.3.3           | FreeBSD:13:i386 |        |         | latest
 2746002 | devel/rubygem-peek-gc-rails5         | rubygem-peek-gc-rails5         | 0.0.2           | FreeBSD:13:i386 |        |         | latest
 2745932 | devel/rubygem-peek-rails5            | rubygem-peek-rails5            | 1.0.1           | FreeBSD:13:i386 |        |         | latest
 2745897 | devel/rubygem-peek-rblineprof-rails5 | rubygem-peek-rblineprof-rails5 | 0.2.0           | FreeBSD:13:i386 |        |         | latest
 2745876 | devel/rubygem-peek-redis-rails5      | rubygem-peek-redis-rails5      | 1.2.0           | FreeBSD:13:i386 |        |         | latest
 2749944 | security/picosha2                    | picosha2                       | 1.0.0.23        | FreeBSD:13:i386 |        |         | latest
 2749043 | sysutils/atf-rk3399                  | atf-rk3399                     | v2.1            | FreeBSD:13:i386 |        |         | latest
 2746664 | sysutils/py-leviathan                | py27-leviathan                 | 0.1.1           | FreeBSD:13:i386 |        |         | latest
 2746885 | sysutils/py-leviathan                | py37-leviathan                 | 0.1.1           | FreeBSD:13:i386 |        |         | latest
 2737873 | sysutils/swapusage                   | swapusage                      | 1.0.0           | FreeBSD:13:i386 |        |         | latest
 2744052 | www/rubygem-gon-rails5               | rubygem-gon-rails5             | 6.2.1           | FreeBSD:13:i386 |        |         | latest
(18 rows)

This is a development database and those ports do not exist on this host. It could also be an out-of-date ports_origin table, but I haven’t checked that yet. All of those ports were added to the FreeBSD repo on these dates:

  1. February 5 2019
  2. April 15 2019
  3. May 26-29 2019

They are present in the production database. Nothing to be done here.

However, the system will have to flag such entries for later followup.

Apr 142020
 

This is the latest in a series of posts dealing with displaying what packages are available for a given FreeBSD port.

The Packages – how is this data stored? post may help you follow along. It outlines the data and the tables used in this post.

Last night I managed to get the right data out for ports which have multiple flavors. I used a few procedures each of one calling another, eventually getting the results.

That effort has now been replaced with a single query.

In this post, we will be looking at the www/py-django-storages port, which has three flavors.

The query

This query came to me from RhodiumToad, who has a wide breadth of knowledge and is liberal with his help. What you see below has been adjusted since then, but it closely represents his delivery.

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

This blog post is mainly to document the above and go through it step by step. I’ll need this later because I know I won’t remember it. I also want to learn more about this query and apply it to future development.

I’m going to step through the query, outlining what each part does.

WITH pkg

A WITH query “provides a way to write auxiliary statements for use in a larger query”.

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

I think of it as a temporary table for this query. It is defining the data which will be used by the rest of the query.

I will refer to this later as the pkg table.

This is what we get:

freshports.dev=# SELECT * FROM packages WHERE port_id = 28303 ORDER BY package_name;
   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------
 4373305 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4369003 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4340907 |      7 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4392074 |     10 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4388153 |     11 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4346130 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4382725 |      2 |   28303 | 1.8             | py27-django-storages |           |             | latest
 4349958 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4353741 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4380869 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4376671 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4357905 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4339390 |      7 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4361606 |      9 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4363708 |      9 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4369021 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4373197 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4365567 |      9 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4357806 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4376807 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4380779 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4353810 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4349931 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4383395 |      2 |   28303 | 1.8             | py37-django-storages |           |             | latest
 4388136 |     11 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4346104 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4392033 |     10 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4342889 |      7 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
(30 rows)

freshports.dev=# 

I sorted the data just to make it easier to follow.

The above represents all the packages available for www/py-django-storages from the FreeBSD repos.

How many different builds do we have for each package_name?

  SELECT package_name, 
         count(package_name) 
    FROM packages 
   WHERE port_id = 28303 
GROUP BY package_name 
ORDER BY package_name;

That query produces this result:

     package_name     | count 
----------------------+-------
 py27-django-storages |    15
 py36-django-storages |     2
 py37-django-storages |    13
(3 rows)

Those numbers will become relevant later.

SELECT DISTINCT

SELECT DISTINCT eliminates duplicate rows from the result. Don’t do what I initially did and use brackets around your first item. Consider SELECT DISTINCT to be a single keyword. It is not a function.

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

We need results for all packages on all ABI. To get there, we need a list of the packages first.

Combining that SELECT DISTINCT into the WITH clause, we get:

WITH pkg AS
  (SELECT * FROM packages WHERE port_id = 28303)
SELECT DISTINCT package_name FROM pkg;

     package_name     
----------------------
 py27-django-storages
 py36-django-storages
 py37-django-storages
(3 rows)

This is the pkg table.

All ABI * package combinations

We want to show people that a package is not available on a given ABI. We also want all tables to have the same number of rows so that scanning across the page compares like-to-like. For that, we want all combinations please.

abi CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn

Here is a list of all packages beside all ABI:

WITH pkg AS
  (SELECT * FROM packages WHERE port_id = 28303)
SELECT * FROM abi CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn

 id |        name        | active |     package_name     
----+--------------------+--------+----------------------
  1 | FreeBSD:12:amd64   | t      | py27-django-storages
  2 | FreeBSD:13:aarch64 | t      | py27-django-storages
  4 | FreeBSD:11:i386    | t      | py27-django-storages
  6 | FreeBSD:11:amd64   | t      | py27-django-storages
  7 | FreeBSD:11:aarch64 | t      | py27-django-storages
  8 | FreeBSD:12:i386    | t      | py27-django-storages
  9 | FreeBSD:12:aarch64 | t      | py27-django-storages
 10 | FreeBSD:13:i386    | t      | py27-django-storages
 11 | FreeBSD:13:amd64   | t      | py27-django-storages
  1 | FreeBSD:12:amd64   | t      | py36-django-storages
  2 | FreeBSD:13:aarch64 | t      | py36-django-storages
  4 | FreeBSD:11:i386    | t      | py36-django-storages
  6 | FreeBSD:11:amd64   | t      | py36-django-storages
  7 | FreeBSD:11:aarch64 | t      | py36-django-storages
  8 | FreeBSD:12:i386    | t      | py36-django-storages
  9 | FreeBSD:12:aarch64 | t      | py36-django-storages
 10 | FreeBSD:13:i386    | t      | py36-django-storages
 11 | FreeBSD:13:amd64   | t      | py36-django-storages
  1 | FreeBSD:12:amd64   | t      | py37-django-storages
  2 | FreeBSD:13:aarch64 | t      | py37-django-storages
  4 | FreeBSD:11:i386    | t      | py37-django-storages
  6 | FreeBSD:11:amd64   | t      | py37-django-storages
  7 | FreeBSD:11:aarch64 | t      | py37-django-storages
  8 | FreeBSD:12:i386    | t      | py37-django-storages
  9 | FreeBSD:12:aarch64 | t      | py37-django-storages
 10 | FreeBSD:13:i386    | t      | py37-django-storages
 11 | FreeBSD:13:amd64   | t      | py37-django-storages
(27 rows)

With 9 rows in the ABI table and 3 distinct package names we get 27 rows.

I will refer to this as the abi_x_package_name data.

LEFT JOIN

We have a list of all the

Now we do a LEFT JOIN. That gets us all the abi_x_package_name combinations we need with, or without, the corresponding package information.

LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)

This takes the entries from pkg and joins it with everything from the abi table.

This takes us to the entire query now:

WITH 
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi, 
       pn.package_name,
       pkg.package_set,
       pkg.package_version 
  FROM abi 
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg on (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 ORDER BY package_name, abi, package_set;

        abi         | abi_id |     package_name     | package_set | package_version 
--------------------+--------+----------------------+-------------+-----------------
 FreeBSD:11:aarch64 |      7 | py27-django-storages | latest      | 1.5.1
 FreeBSD:11:aarch64 |      7 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | latest      | 1.5.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | latest      | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:13:aarch64 |      2 | py27-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py27-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:aarch64 |      7 | py36-django-storages | latest      | 1.5.1
 FreeBSD:11:amd64   |      6 | py36-django-storages |             | 
 FreeBSD:11:i386    |      4 | py36-django-storages |             | 
 FreeBSD:12:aarch64 |      9 | py36-django-storages | latest      | 1.5.1
 FreeBSD:12:amd64   |      1 | py36-django-storages |             | 
 FreeBSD:12:i386    |      8 | py36-django-storages |             | 
 FreeBSD:13:aarch64 |      2 | py36-django-storages |             | 
 FreeBSD:13:amd64   |     11 | py36-django-storages |             | 
 FreeBSD:13:i386    |     10 | py36-django-storages |             | 
 FreeBSD:11:aarch64 |      7 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:13:aarch64 |      2 | py37-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py37-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py37-django-storages | latest      | 1.9.1
(37 rows)

freshports.dev=#

From here, we start pivoting the data so we have latest and quarterly on the same line.

Pivot

By now, we have gone through the entire query. Next is the pivot.

WITH
  pkg AS (SELECT * FROM packages WHERE port_id = 28303)
SELECT abi.name AS abi,
       pn.package_name,
       max(pkg.package_version) FILTER (WHERE pkg.package_set = 'latest')
         AS package_version_latest,
       max(pkg.package_version) FILTER (WHERE pkg.package_set = 'quarterly')
         AS package_version_quarterly
  FROM abi
       CROSS JOIN (SELECT DISTINCT package_name FROM pkg) pn
       LEFT JOIN pkg ON (pkg.abi_id = abi.id AND pkg.package_name = pn.package_name)
 GROUP BY pn.package_name, abi.name
 ORDER BY pn.package_name, abi.name;

The filter controls which input rows are fed to the aggregate function max. I could have use min(), or any other similar function here because there is only one value which will match it. The GROUP BY clause is what controls this. For each package_name and abi.name combination, there will be 1 or two lines.

The output is the expected 27 lines of information:

        abi         |     package_name     | package_version_latest | package_version_quarterly 
--------------------+----------------------+------------------------+---------------------------
 FreeBSD:11:aarch64 | py27-django-storages | 1.5.1                  | 1.9.1
 FreeBSD:11:amd64   | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:11:i386    | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:aarch64 | py27-django-storages | 1.5.1                  | 1.9.1
 FreeBSD:12:amd64   | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:i386    | py27-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:13:aarch64 | py27-django-storages | 1.8                    | 
 FreeBSD:13:amd64   | py27-django-storages | 1.9.1                  | 
 FreeBSD:13:i386    | py27-django-storages | 1.9.1                  | 
 FreeBSD:11:aarch64 | py36-django-storages | 1.5.1                  | 
 FreeBSD:11:amd64   | py36-django-storages |                        | 
 FreeBSD:11:i386    | py36-django-storages |                        | 
 FreeBSD:12:aarch64 | py36-django-storages | 1.5.1                  | 
 FreeBSD:12:amd64   | py36-django-storages |                        | 
 FreeBSD:12:i386    | py36-django-storages |                        | 
 FreeBSD:13:aarch64 | py36-django-storages |                        | 
 FreeBSD:13:amd64   | py36-django-storages |                        | 
 FreeBSD:13:i386    | py36-django-storages |                        | 
 FreeBSD:11:aarch64 | py37-django-storages |                        | 1.9.1
 FreeBSD:11:amd64   | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:11:i386    | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:aarch64 | py37-django-storages |                        | 1.9.1
 FreeBSD:12:amd64   | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:12:i386    | py37-django-storages | 1.9.1                  | 1.9.1
 FreeBSD:13:aarch64 | py37-django-storages | 1.8                    | 
 FreeBSD:13:amd64   | py37-django-storages | 1.9.1                  | 
 FreeBSD:13:i386    | py37-django-storages | 1.9.1                  | 
(27 rows)

Why 27? 3 package names, 9 ABI.

Much better

This is a much better approach than the procedures I used before. I like it.

Apr 142020
 

This post documents what I recently put into a gist on Github. It should serve as a rough introduction to how FreshPorts handles packages extracted from packagesite.yaml files.

All the platforms

These are the plaforms (also known as ABI) for which we are monitoring packages created by the FreeBSD project.

They relate to URLs found here: https://pkg.freebsd.org

For each ABI two repos are built, each based off a different code repository:

* latest (head)
* quarterly (now 2020Q2)

Quarterly branches are designed to “provide users with a more predictable and stable experience for port and package installation and upgrades. This is done essentially by only allowing non-feature updates.”

This is the abi data:

freshports.dev=# select * from abi;
 id |        name        | active 
----+--------------------+--------
  1 | FreeBSD:12:amd64   | t
  2 | FreeBSD:13:aarch64 | t
  4 | FreeBSD:11:i386    | t
  6 | FreeBSD:11:amd64   | t
  7 | FreeBSD:11:aarch64 | t
  8 | FreeBSD:12:i386    | t
  9 | FreeBSD:12:aarch64 | t
 10 | FreeBSD:13:i386    | t
 11 | FreeBSD:13:amd64   | t
(9 rows)

freshports.dev=# \d abi
                           Table "public.abi"
 Column |  Type   | Collation | Nullable |           Default            
--------+---------+-----------+----------+------------------------------
 id     | integer |           | not null | generated always as identity
 name   | text    |           | not null | 
 active | boolean |           | not null | true
Indexes:
    "abi_pkey" PRIMARY KEY, btree (id)
    "abi_name_idx" UNIQUE, btree (name)
Referenced by:
    TABLE "packages" CONSTRAINT "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    TABLE "packages_last_checked" CONSTRAINT "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)

freshports.dev=# 

What packages exist for a given port?

The packages table contains information taken from packagesite.yaml found at https://pkg.freebsd.org

Both latest and quarterly are shown.

For this example, we look at the www/py-django-storages port:

This port has three different packges avaiable:

* py27-django-storages
* py36-django-storages
* py37-django-storages

This is not unusual. Many ports have flavors.

For each package_name, I need an outer join from the data below, with the table above (ABI). All ABI values need to
be included for a given package.

The absence of a package for a given ABI is information presented to the user.

freshports.dev=# select * from packages where port_id = 28303;
   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4339390 |      7 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4340907 |      7 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4342889 |      7 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4346104 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4346130 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4349931 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4349958 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4353741 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4353810 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4357806 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4357905 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest
 4361606 |      9 |   28303 | 1.5.1           | py27-django-storages |           |             | latest
 4363708 |      9 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4365567 |      9 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4369003 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4369021 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4373197 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4373305 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4376671 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4376807 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4380779 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly
 4380869 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly
 4382725 |      2 |   28303 | 1.8             | py27-django-storages |           |             | latest
 4383395 |      2 |   28303 | 1.8             | py37-django-storages |           |             | latest
 4388136 |     11 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4388153 |     11 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
 4392033 |     10 |   28303 | 1.9.1           | py37-django-storages |           |             | latest
 4392074 |     10 |   28303 | 1.9.1           | py27-django-storages |           |             | latest
(30 rows)

freshports.dev=# \d packages
                               Table "public.packages"
     Column      |     Type     | Collation | Nullable |           Default            
-----------------+--------------+-----------+----------+------------------------------
 id              | bigint       |           | not null | generated always as identity
 abi_id          | integer      |           | not null | 
 port_id         | integer      |           | not null | 
 package_version | text         |           | not null | 
 package_name    | text         |           | not null | 
 branch_id       | integer      |           |          | 
 branch_name     | text         |           |          | 
 package_set     | package_sets |           |          | 
Indexes:
    "packages_pkey" PRIMARY KEY, btree (id)
    "fki_packages_abi_id_fk" btree (abi_id)
    "fki_packages_port_id_fk" btree (port_id)
    "packages_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    "packages_branch_id_fk" FOREIGN KEY (branch_id) REFERENCES system_branch(id) NOT VALID
    "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID

freshports.dev=# 


freshports.dev=# 

What package is on what ABI?

This is a sample OUTER JOIN to get the required data for a single package.

freshports.dev=# SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py36-django-storages') AS P ON abi.id = P.abi_id
;
   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set | id |        name        | active 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------+----+--------------------+--------
         |        |         |                 |                      |           |             |             |  1 | FreeBSD:12:amd64   | t
         |        |         |                 |                      |           |             |             |  2 | FreeBSD:13:aarch64 | t
         |        |         |                 |                      |           |             |             |  4 | FreeBSD:11:i386    | t
         |        |         |                 |                      |           |             |             |  6 | FreeBSD:11:amd64   | t
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  7 | FreeBSD:11:aarch64 | t
         |        |         |                 |                      |           |             |             |  8 | FreeBSD:12:i386    | t
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  9 | FreeBSD:12:aarch64 | t
         |        |         |                 |                      |           |             |             | 10 | FreeBSD:13:i386    | t
         |        |         |                 |                      |           |             |             | 11 | FreeBSD:13:amd64   | t
(9 rows)

freshports.dev=# 

What does a full set for one port look like?

This repeats the above OUTER JOIN but for all three package names

SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py27-django-storages') AS P ON abi.id = P.abi_id
UNION
SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py36-django-storages') AS P ON abi.id = P.abi_id
UNION
SELECT P.*, abi.*
  FROM abi LEFT OUTER JOIN (SELECT * FROM packages P WHERE P.port_id = 28303 and package_name = 'py37-django-storages') AS P ON abi.id = P.abi_id;

   id    | abi_id | port_id | package_version |     package_name     | branch_id | branch_name | package_set | id |        name        | active 
---------+--------+---------+-----------------+----------------------+-----------+-------------+-------------+----+--------------------+--------
 4380779 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  8 | FreeBSD:12:i386    | t
 4373197 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  1 | FreeBSD:12:amd64   | t
         |        |         |                 |                      |           |             |             |  8 | FreeBSD:12:i386    | t
 4369003 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  1 | FreeBSD:12:amd64   | t
 4340907 |      7 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  7 | FreeBSD:11:aarch64 | t
 4339245 |      7 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  7 | FreeBSD:11:aarch64 | t
 4357905 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  4 | FreeBSD:11:i386    | t
         |        |         |                 |                      |           |             |             |  1 | FreeBSD:12:amd64   | t
 4339390 |      7 |   28303 | 1.5.1           | py27-django-storages |           |             | latest      |  7 | FreeBSD:11:aarch64 | t
 4376671 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  8 | FreeBSD:12:i386    | t
 4383395 |      2 |   28303 | 1.8             | py37-django-storages |           |             | latest      |  2 | FreeBSD:13:aarch64 | t
 4376807 |      8 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  8 | FreeBSD:12:i386    | t
 4357806 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  4 | FreeBSD:11:i386    | t
 4363708 |      9 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  9 | FreeBSD:12:aarch64 | t
         |        |         |                 |                      |           |             |             | 10 | FreeBSD:13:i386    | t
 4365567 |      9 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  9 | FreeBSD:12:aarch64 | t
 4361606 |      9 |   28303 | 1.5.1           | py27-django-storages |           |             | latest      |  9 | FreeBSD:12:aarch64 | t
 4361337 |      9 |   28303 | 1.5.1           | py36-django-storages |           |             | latest      |  9 | FreeBSD:12:aarch64 | t
         |        |         |                 |                      |           |             |             |  4 | FreeBSD:11:i386    | t
 4388153 |     11 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      | 11 | FreeBSD:13:amd64   | t
 4392074 |     10 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      | 10 | FreeBSD:13:i386    | t
 4349958 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  6 | FreeBSD:11:amd64   | t
 4382725 |      2 |   28303 | 1.8             | py27-django-storages |           |             | latest      |  2 | FreeBSD:13:aarch64 | t
 4346130 |      6 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  6 | FreeBSD:11:amd64   | t
         |        |         |                 |                      |           |             |             | 11 | FreeBSD:13:amd64   | t
 4353741 |      4 |   28303 | 1.9.1           | py27-django-storages |           |             | latest      |  4 | FreeBSD:11:i386    | t
 4392033 |     10 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      | 10 | FreeBSD:13:i386    | t
 4353810 |      4 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  4 | FreeBSD:11:i386    | t
 4349931 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  6 | FreeBSD:11:amd64   | t
 4342889 |      7 |   28303 | 1.9.1           | py37-django-storages |           |             | quarterly   |  7 | FreeBSD:11:aarch64 | t
 4369021 |      1 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  1 | FreeBSD:12:amd64   | t
         |        |         |                 |                      |           |             |             |  6 | FreeBSD:11:amd64   | t
 4373305 |      1 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  1 | FreeBSD:12:amd64   | t
 4388136 |     11 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      | 11 | FreeBSD:13:amd64   | t
 4346104 |      6 |   28303 | 1.9.1           | py37-django-storages |           |             | latest      |  6 | FreeBSD:11:amd64   | t
         |        |         |                 |                      |           |             |             |  2 | FreeBSD:13:aarch64 | t
 4380869 |      8 |   28303 | 1.9.1           | py27-django-storages |           |             | quarterly   |  8 | FreeBSD:12:i386    | t
(37 rows)

A procedure to produce the above

I could not find a way to do the OUTER JOIN on any result set which consisted of more than one package_name.

Since this was written, I now have a single query solution.

I created this function which worked on one package_name at a time.

CREATE OR REPLACE FUNCTION PortPackagesNonPivot(a_port_id integer) RETURNS TABLE(abi text, abi_id integer, package_name text, package_set package_sets, package_version text)
    LANGUAGE plpgsql STABLE
    AS $$
  DECLARE
    l_rec  RECORD;
  BEGIN
    FOR l_rec IN SELECT *
                 FROM PackagesGetPackageNamesForPort(a_port_id) ORDER BY 1
    LOOP
      RETURN QUERY
        SELECT abi.name, abi.id, l_rec.package_name, P.package_set, P.package_version
          FROM abi LEFT OUTER JOIN 
             (SELECT * 
                FROM packages P 
               WHERE P.port_id = a_port_id
                 AND P.package_name = l_rec.package_name) AS P 
             ON abi.id = P.abi_id
         ORDER BY l_rec.package_name, abi.name;
    END LOOP;
END;
$$;


freshports.dev=# select * from PortPackagesNonPivot(28303);
        abi         | abi_id |     package_name     | package_set | package_version 
--------------------+--------+----------------------+-------------+-----------------
 FreeBSD:11:aarch64 |      7 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:aarch64 |      7 | py27-django-storages | latest      | 1.5.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:amd64   |      6 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py27-django-storages | latest      | 1.5.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py27-django-storages | latest      | 1.9.1
 FreeBSD:13:aarch64 |      2 | py27-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py27-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py27-django-storages | latest      | 1.9.1
 FreeBSD:11:aarch64 |      7 | py36-django-storages | latest      | 1.5.1
 FreeBSD:11:amd64   |      6 | py36-django-storages |             | 
 FreeBSD:11:i386    |      4 | py36-django-storages |             | 
 FreeBSD:12:aarch64 |      9 | py36-django-storages | latest      | 1.5.1
 FreeBSD:12:amd64   |      1 | py36-django-storages |             | 
 FreeBSD:12:i386    |      8 | py36-django-storages |             | 
 FreeBSD:13:aarch64 |      2 | py36-django-storages |             | 
 FreeBSD:13:amd64   |     11 | py36-django-storages |             | 
 FreeBSD:13:i386    |     10 | py36-django-storages |             | 
 FreeBSD:11:aarch64 |      7 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:11:amd64   |      6 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | latest      | 1.9.1
 FreeBSD:11:i386    |      4 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:aarch64 |      9 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:amd64   |      1 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | latest      | 1.9.1
 FreeBSD:12:i386    |      8 | py37-django-storages | quarterly   | 1.9.1
 FreeBSD:13:aarch64 |      2 | py37-django-storages | latest      | 1.8
 FreeBSD:13:amd64   |     11 | py37-django-storages | latest      | 1.9.1
 FreeBSD:13:i386    |     10 | py37-django-storages | latest      | 1.9.1
(37 rows)

Pivot the data

One goal is to put quarterly and latest results onto the same line.

 SELECT package_name, 
        abi_id, 
        max(package_version) filter(where package_set = 'latest')    AS package_version_latest,
        max(package_version) filter(where package_set = 'quarterly') AS package_version_quarterly
  FROM packages 
 WHERE port_id = 28303
 GROUP BY package_name, abi_id;
 
     package_name     | abi_id | package_version_latest | package_version_quarterly 
----------------------+--------+------------------------+---------------------------
 py27-django-storages |      1 | 1.9.1                  | 1.9.1
 py27-django-storages |      2 | 1.8                    | 
 py27-django-storages |      4 | 1.9.1                  | 1.9.1
 py27-django-storages |      6 | 1.9.1                  | 1.9.1
 py27-django-storages |      7 | 1.5.1                  | 1.9.1
 py27-django-storages |      8 | 1.9.1                  | 1.9.1
 py27-django-storages |      9 | 1.5.1                  | 1.9.1
 py27-django-storages |     10 | 1.9.1                  | 
 py27-django-storages |     11 | 1.9.1                  | 
 py36-django-storages |      7 | 1.5.1                  | 
 py36-django-storages |      9 | 1.5.1                  | 
 py37-django-storages |      1 | 1.9.1                  | 1.9.1
 py37-django-storages |      2 | 1.8                    | 
 py37-django-storages |      4 | 1.9.1                  | 1.9.1
 py37-django-storages |      6 | 1.9.1                  | 1.9.1
 py37-django-storages |      7 |                        | 1.9.1
 py37-django-storages |      8 | 1.9.1                  | 1.9.1
 py37-django-storages |      9 |                        | 1.9.1
 py37-django-storages |     10 | 1.9.1                  | 
 py37-django-storages |     11 | 1.9.1                  | 
(20 rows)

freshports.dev=# 

Combining the pivot with the data

Hwere I combine the previous two points into one function.

This incorporates additional data from packages_last_checked.

CREATE OR REPLACE FUNCTION PortPackages(a_port_id integer) 
RETURNS TABLE(package_name text, abi text, package_version_latest text, package_version_quarterly text,
     last_checked_latest    timestamp with time zone, repo_date_latest    timestamp with time zone, import_date_latest     timestamp with time zone,
     last_checked_quarterly timestamp with time zone, repo_date_quarterly timestamp with time zone, import_date_quarterly  timestamp with time zone)
  LANGUAGE SQL STABLE
  AS $$

    SELECT P.package_name,
           P.abi,
           max(package_version)  filter(where P.package_set   = 'latest')    AS package_version_latest,
           max(package_version)  filter(where P.package_set   = 'quarterly') AS package_version_quarterly,
           max(PLC.last_checked) filter(where PLC.package_set = 'latest')    AS last_checked_latest,
           max(PLC.repo_date)    filter(where PLC.package_set = 'latest')    AS repo_date_latest,
           max(PLC.import_date)  filter(where PLC.package_set = 'latest')    AS import_date_latest,
           max(PLC.last_checked) filter(where PLC.package_set = 'quarterly') AS last_checked_quarterly,
           max(PLC.repo_date)    filter(where PLC.package_set = 'quarterly') AS repo_date_quarterly,
           max(PLC.import_date)  filter(where PLC.package_set = 'quarterly') AS import_date_quarterly
      FROM PortPackagesNonPivot(a_port_id) P JOIN packages_last_checked PLC ON P.abi_id = PLC.abi_id
  GROUP BY P.package_name, P.abi
  ORDER BY P.package_name, P.abi;
$$;

freshports.dev=# SELECT * FROM PortPackages(28303);
     package_name     |        abi         | package_version_latest | package_version_quarterly |      last_checked_latest      |    repo_date_latest    |      import_date_latest       |    last_checked_quarterly     |  repo_date_quarterly   |     import_date_quarterly     |                sort_key                 
----------------------+--------------------+------------------------+---------------------------+-------------------------------+------------------------+-------------------------------+-------------------------------+------------------------+-------------------------------+-----------------------------------------
 py27-django-storages | FreeBSD:11:aarch64 | 1.5.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-09-29 04:24:00+00 | 2020-04-13 14:09:37.634897+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-07 12:03:00+00 | 2020-04-13 14:09:41.320966+00 | py27-django-storages-FreeBSD:11:aarch64
 py27-django-storages | FreeBSD:11:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 00:16:00+00 | 2020-04-13 14:09:46.602236+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:20:00+00 | 2020-04-13 14:09:51.877809+00 | py27-django-storages-FreeBSD:11:amd64
 py27-django-storages | FreeBSD:11:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 05:24:00+00 | 2020-04-13 14:09:56.774418+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 02:36:00+00 | 2020-04-13 14:10:01.378743+00 | py27-django-storages-FreeBSD:11:i386
 py27-django-storages | FreeBSD:12:aarch64 | 1.5.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-10-11 05:47:00+00 | 2020-04-13 14:10:06.230199+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 09:21:00+00 | 2020-04-13 14:10:10.340624+00 | py27-django-storages-FreeBSD:12:aarch64
 py27-django-storages | FreeBSD:12:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 01:18:00+00 | 2020-04-13 14:10:14.600613+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:56:00+00 | 2020-04-13 14:10:18.777864+00 | py27-django-storages-FreeBSD:12:amd64
 py27-django-storages | FreeBSD:12:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 06:22:00+00 | 2020-04-13 14:10:22.804925+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:36:00+00 | 2020-04-13 14:10:26.964927+00 | py27-django-storages-FreeBSD:12:i386
 py27-django-storages | FreeBSD:13:aarch64 | 1.8                    |                           | 2020-04-13 14:06:19.662548+00 | 2020-01-04 12:39:00+00 | 2020-04-13 14:10:31.011334+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py27-django-storages-FreeBSD:13:aarch64
 py27-django-storages | FreeBSD:13:amd64   | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 02:22:00+00 | 2020-04-13 14:10:35.255075+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py27-django-storages-FreeBSD:13:amd64
 py27-django-storages | FreeBSD:13:i386    | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-10 23:14:00+00 | 2020-04-13 14:10:39.416529+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py27-django-storages-FreeBSD:13:i386
 py36-django-storages | FreeBSD:11:aarch64 | 1.5.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2018-09-29 04:24:00+00 | 2020-04-13 14:09:37.634897+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-07 12:03:00+00 | 2020-04-13 14:09:41.320966+00 | py36-django-storages-FreeBSD:11:aarch64
 py36-django-storages | FreeBSD:11:amd64   |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 00:16:00+00 | 2020-04-13 14:09:46.602236+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:20:00+00 | 2020-04-13 14:09:51.877809+00 | py36-django-storages-FreeBSD:11:amd64
 py36-django-storages | FreeBSD:11:i386    |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-13 05:24:00+00 | 2020-04-13 14:09:56.774418+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 02:36:00+00 | 2020-04-13 14:10:01.378743+00 | py36-django-storages-FreeBSD:11:i386
 py36-django-storages | FreeBSD:12:aarch64 | 1.5.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2018-10-11 05:47:00+00 | 2020-04-13 14:10:06.230199+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 09:21:00+00 | 2020-04-13 14:10:10.340624+00 | py36-django-storages-FreeBSD:12:aarch64
 py36-django-storages | FreeBSD:12:amd64   |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 01:18:00+00 | 2020-04-13 14:10:14.600613+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:56:00+00 | 2020-04-13 14:10:18.777864+00 | py36-django-storages-FreeBSD:12:amd64
 py36-django-storages | FreeBSD:12:i386    |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-13 06:22:00+00 | 2020-04-13 14:10:22.804925+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:36:00+00 | 2020-04-13 14:10:26.964927+00 | py36-django-storages-FreeBSD:12:i386
 py36-django-storages | FreeBSD:13:aarch64 |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-01-04 12:39:00+00 | 2020-04-13 14:10:31.011334+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py36-django-storages-FreeBSD:13:aarch64
 py36-django-storages | FreeBSD:13:amd64   |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 02:22:00+00 | 2020-04-13 14:10:35.255075+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py36-django-storages-FreeBSD:13:amd64
 py36-django-storages | FreeBSD:13:i386    |                        |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-10 23:14:00+00 | 2020-04-13 14:10:39.416529+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py36-django-storages-FreeBSD:13:i386
 py37-django-storages | FreeBSD:11:aarch64 |                        | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-09-29 04:24:00+00 | 2020-04-13 14:09:37.634897+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-07 12:03:00+00 | 2020-04-13 14:09:41.320966+00 | py37-django-storages-FreeBSD:11:aarch64
 py37-django-storages | FreeBSD:11:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 00:16:00+00 | 2020-04-13 14:09:46.602236+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:20:00+00 | 2020-04-13 14:09:51.877809+00 | py37-django-storages-FreeBSD:11:amd64
 py37-django-storages | FreeBSD:11:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 05:24:00+00 | 2020-04-13 14:09:56.774418+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 02:36:00+00 | 2020-04-13 14:10:01.378743+00 | py37-django-storages-FreeBSD:11:i386
 py37-django-storages | FreeBSD:12:aarch64 |                        | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2018-10-11 05:47:00+00 | 2020-04-13 14:10:06.230199+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 09:21:00+00 | 2020-04-13 14:10:10.340624+00 | py37-django-storages-FreeBSD:12:aarch64
 py37-django-storages | FreeBSD:12:amd64   | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-11 01:18:00+00 | 2020-04-13 14:10:14.600613+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:56:00+00 | 2020-04-13 14:10:18.777864+00 | py37-django-storages-FreeBSD:12:amd64
 py37-django-storages | FreeBSD:12:i386    | 1.9.1                  | 1.9.1                     | 2020-04-13 14:06:19.662548+00 | 2020-04-13 06:22:00+00 | 2020-04-13 14:10:22.804925+00 | 2020-04-13 14:06:19.662548+00 | 2020-04-12 03:36:00+00 | 2020-04-13 14:10:26.964927+00 | py37-django-storages-FreeBSD:12:i386
 py37-django-storages | FreeBSD:13:aarch64 | 1.8                    |                           | 2020-04-13 14:06:19.662548+00 | 2020-01-04 12:39:00+00 | 2020-04-13 14:10:31.011334+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py37-django-storages-FreeBSD:13:aarch64
 py37-django-storages | FreeBSD:13:amd64   | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-11 02:22:00+00 | 2020-04-13 14:10:35.255075+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py37-django-storages-FreeBSD:13:amd64
 py37-django-storages | FreeBSD:13:i386    | 1.9.1                  |                           | 2020-04-13 14:06:19.662548+00 | 2020-04-10 23:14:00+00 | 2020-04-13 14:10:39.416529+00 | 2020-04-13 14:06:19.662548+00 |                        |                               | py37-django-storages-FreeBSD:13:i386
(27 rows)

freshports.dev=# 
Apr 122020
 

Since the last past, I’ve added quarterly packages to the system.

I want to outline the changes while I remember the.

I will start by outline the tables we have.

abi

The abi table contains the usual suspects. Entries in this table are added manually, when a new release comes out.

See https://pkg.freebsd.org for examples of where these values are obtained from.

freshports.dev=# SELECT * FROM abi ORDER BY name;
 id |        name        | active |         last_checked          |         last_updated          
----+--------------------+--------+-------------------------------+-------------------------------
  7 | FreeBSD:11:aarch64 | t      | 2020-04-10 21:51:28.219155+00 | 2020-04-10 21:51:28.219155+00
  6 | FreeBSD:11:amd64   | t      | 2020-04-10 21:51:28.107421+00 | 2020-04-10 21:51:28.107421+00
  4 | FreeBSD:11:i386    | t      | 2020-04-10 21:05:01.431882+00 | 2020-04-10 21:05:01.431882+00
  9 | FreeBSD:12:aarch64 | t      | 2020-04-10 21:51:28.361931+00 | 2020-04-10 21:51:28.361931+00
  1 | FreeBSD:12:amd64   | t      | 2020-04-10 18:22:10.545416+00 | 2020-04-10 18:24:56.829453+00
  8 | FreeBSD:12:i386    | t      | 2020-04-10 21:51:28.278548+00 | 2020-04-10 21:51:28.278548+00
  2 | FreeBSD:13:aarch64 | t      | 2020-04-10 18:39:12.554194+00 | 2020-04-10 18:39:12.554194+00
 11 | FreeBSD:13:amd64   | t      | 2020-04-10 21:51:28.479757+00 | 2020-04-10 21:51:28.479757+00
 10 | FreeBSD:13:i386    | t      | 2020-04-10 21:51:28.421228+00 | 2020-04-10 21:51:28.421228+00
(9 rows)

freshports.dev=# \d abi
                                      Table "public.abi"
    Column    |           Type           | Collation | Nullable |           Default            
--------------+--------------------------+-----------+----------+------------------------------
 id           | integer                  |           | not null | generated always as identity
 name         | text                     |           | not null | 
 active       | boolean                  |           | not null | true
 last_checked | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 last_updated | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
Indexes:
    "abi_pkey" PRIMARY KEY, btree (id)
    "abi_name_idx" UNIQUE, btree (name)
Referenced by:
    TABLE "packages" CONSTRAINT "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    TABLE "packages_last_checked" CONSTRAINT "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)
    TABLE "packages_raw_original" CONSTRAINT "packages_raw_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID

freshports.dev=# 

The last_checked and last_updated columns are now deprecated. You will see their respective replacements in the packages_last_checked table.

packages

The packages contains the latest package version available for a given ABI on a given branch (e.g. latest, quarterly, 2020Q2).

freshports.dev=# SELECT * FROM packages ORDER BY package_name LIMIT 10;
   id   | abi_id | port_id | package_version | package_name | branch_id 
--------+--------+---------+-----------------+--------------+-----------
 186113 |      4 |   31186 | 0.0.23b_9       | 0ad          |         1
 204653 |      6 |   31186 | 0.0.23b_9       | 0ad          |         1
 278713 |      8 |   31186 | 0.0.23b_9       | 0ad          |         1
 371391 |     10 |   31186 | 0.0.23b_9       | 0ad          |         1
 403274 |     11 |   31186 | 0.0.23b_9       | 0ad          |         1
 495736 |      1 |   31186 | 0.0.23b_9       | 0ad          |         1
 590825 |      1 |   31186 | 0.0.23b_9       | 0ad          |       142
 622854 |      4 |   31186 | 0.0.23b_9       | 0ad          |       142
 653983 |      6 |   31186 | 0.0.23b_9       | 0ad          |       142
 712933 |      8 |   31186 | 0.0.23b_9       | 0ad          |       142
(10 rows)

freshports.dev=# \d packages
                             Table "public.packages"
     Column      |  Type   | Collation | Nullable |           Default            
-----------------+---------+-----------+----------+------------------------------
 id              | bigint  |           | not null | generated always as identity
 abi_id          | integer |           | not null | 
 port_id         | integer |           | not null | 
 package_version | text    |           | not null | 
 package_name    | text    |           | not null | 
 branch_id       | integer |           | not null | 
Indexes:
    "packages_pkey" PRIMARY KEY, btree (id)
    "fki_packages_abi_id_fk" btree (abi_id)
    "fki_packages_port_id_fk" btree (port_id)
    "packages_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    "packages_branch_id_fk" FOREIGN KEY (branch_id) REFERENCES system_branch(id) NOT VALID
    "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID

packages_raw

The packages_raw table is a plain text version of the packages table. The import script loads data into this table before it is loaded into the packages table.

The loading from the temporary table (packages_raw) ito the permanent table (packages) is easily done with a single database query. Doing the same work within an external program would be non-trivial, slower, and error-prone.

The conversion process

This query takes the raw data from the temporary table (packages_raw) into the permanent table (packages).

INSERT INTO packages(abi_id, branch_id, port_id, package_name, package_version)
SELECT GetABIId(abi),
       GetBranchId(branch),
       GetPort(package_origin),
       package_name,
       package_version                                                               
  FROM packages_raw
 WHERE GetPort(package_origin) IS NOT NULL
   AND GetBranchId(branch)     IS NOT NULL
   AND GetABIId(abi)           IS NOT NULL ;

At present, this query does all of the data and has no filtering (e.g. by ABI or branch).

packages_last_checked

The packages_last_checked table keeps track of work done or to be done.


freshports.dev=# SELECT * FROM packages_last_checked;
 abi_id | branch_id |         last_checked          |       repo_date        | import_date 
--------+-----------+-------------------------------+------------------------+-------------
      7 |         1 | 2020-04-12 16:56:25.824973+00 | 2018-09-29 04:24:00+00 | 
      7 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-07 12:03:00+00 | 
      6 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 00:16:00+00 | 
      6 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:20:00+00 | 
      4 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 18:43:00+00 | 
      4 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 02:36:00+00 | 
      9 |         1 | 2020-04-12 16:56:25.824973+00 | 2018-10-11 05:47:00+00 | 
      9 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-02 05:09:00+00 | 
      1 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 01:18:00+00 | 
      1 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:56:00+00 | 
      8 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 19:12:00+00 | 
      8 |       142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:36:00+00 | 
      2 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-01-04 12:39:00+00 | 
      2 |       142 | 2020-04-12 16:56:25.824973+00 |                        | 
     11 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 02:22:00+00 | 
     11 |       142 | 2020-04-12 16:56:25.824973+00 |                        | 
     10 |         1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 23:14:00+00 | 
     10 |       142 | 2020-04-12 16:56:25.824973+00 |                        | 
(18 rows)

freshports.dev=# \d packages_last_checked
                   Table "public.packages_last_checked"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 abi_id       | integer                  |           | not null | 
 branch_id    | integer                  |           | not null | 
 last_checked | timestamp with time zone |           |          | 
 repo_date    | timestamp with time zone |           |          | 
 import_date  | timestamp with time zone |           |          | 
Indexes:
    "packages_last_checked_pkey" PRIMARY KEY, btree (abi_id, branch_id)
    "packages_last_checked_branch_id_idx" btree (branch_id)
Foreign-key constraints:
    "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)

freshports.dev=# 

When the repo_date column is empty, no repo was found for this ABI & branch combination.

Note the import_date column is empty. The script which populates the packages table does not yet update that value.

The check_repos_for_new_stuff.py script updates the last_checked and repo_date columns.

To find repos which need to be updated, use this query:

SELECT *
  FROM packages_last_checked 
 WHERE repo_date > import_date 
   OR (repo_date IS NOT NULL AND import_date IS NULL);

What’s next?

The import-raw-abi.sh script is the basis of future work. The pseudo code is similar to this:

SELECT ABI, branch values which need updating
FOR EACH combination
DO
   fetch https://pkg.freebsd.org/$abi/$branch/packagesite.txz
    unxz packagesite.txz
    tar -xf packagesite.tar

    ls -l ./packagesite.yaml
    jq -rc --arg ABI "$abi" --arg BRANCH "$branch" '[$ABI, $BRANCH, .origin, .name, .version] | @tsv' < ./packagesite.yaml > packagesite.csv
  
    ~/bin/import-via-copy-packagesite-all-raw-fields.py -i packagesite.csv

   Update the packages_last_checked.import_date for this combination
DONE

NOTES on this:

  • The launching of this script can be trigger based. When an update occurs to the packages_last_checked, observe if new data is detected, and set work-to-be-done accordingly.
  • I now think that import and conversion from packages_raw to packages can be done separately.
  • Add another column to the packages_last_checked, call it import_processed.
  • When you find a column with import_date > import_processed, that indicates you need to run the conversion process (see a previous section).
Apr 102020
 

This post deals with adding a new feature: A table of which FreeBSD version and architecture has a given package

The goal: parse the packagesite.yaml supplied with each repo and load the results into the FreshPorts database.

A bunch of the background in the above URL, and I don’t want to repeat it here.

This post is mostly my notes on who this is being done.

What’s in the file

The file is labelled as .yaml, which is a superset of JSON. I first started parsing the file with Python but ran into encoding errors:

Things to be determined

There are a number of items, not so urgent, which must be resolved.

Starting with Python

I started off parsing packagesite.yaml with this Python script. This produced a tab-delimited file which could then be loaded into the database.

The creation of the CSV file took about 3 minutes.

Posting for help

Last night, I posted on social media looking for help:

The next morning brought many suggestions (thank you).

The most elegant solution was the use of textproc/jq (homepage).

The new working proof-of-concepts

This is the one-liner for creating the tab-delimited file:

$ time jq -rc '[1, .origin, .name, .version] | @tsv' < ~/tmp/FreeBSD\:12\:amd64/latest/packagesite.yaml > packagesite.csv

real0m1.351s
user0m1.295s
sys0m0.055s

That is plenty fast for processing a 31491-line file.

Loading the code into the database is still plenty fast as well:

$ time ./import-via-copy-packagesite.py

real	0m2.077s
user	0m0.126s
sys	0m0.016s

The database tables

The data is imported into this table:

freshports.dev=# \d packages_raw
                           Table "public.packages_raw"
     Column      |  Type   | Collation | Nullable |           Default            
-----------------+---------+-----------+----------+------------------------------
 id              | bigint  |           | not null | generated always as identity
 abi_id          | integer |           | not null | 
 package_origin  | text    |           | not null | 
 package_name    | text    |           | not null | 
 package_version | text    |           | not null | 
Indexes:
    "packages_raw_pkey" PRIMARY KEY, btree (id)
    "fki_packages_raw_abi_id_fk" btree (abi_id)
    "packages_raw_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_raw_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID

freshports.dev=# 

That data looks like this:

freshports.dev=# SELECT * FROM packages_raw ORDER BY package_name LIMIT 5;
   id   | abi_id |      package_origin       |   package_name   | package_version 
--------+--------+---------------------------+------------------+-----------------
 125562 |      1 | games/0ad                 | 0ad              | 0.0.23b_9
 124778 |      1 | security/0d1n             | 0d1n             | 2.3
 124774 |      1 | games/0verkill            | 0verkill         | 0.16_2
 124768 |      1 | security/1password-client | 1password-client | 0.9.3
 124760 |      1 | games/2048                | 2048             | 0.9.1
(5 rows)

freshports.dev=# 

That just allows me to run queries on the data before copying into this table:

freshports.dev=# \d packages
                           Table "public.packages"
    Column    |  Type   | Collation | Nullable |           Default            
--------------+---------+-----------+----------+------------------------------
 id           | bigint  |           | not null | generated always as identity
 abi_id       | integer |           | not null | 
 port_id      | integer |           | not null | 
 version      | text    |           | not null | 
 package_name | text    |           | not null | 
Indexes:
    "packages_pkey" PRIMARY KEY, btree (id)
    "fki_packages_abi_id_fk" btree (abi_id)
    "fki_packages_port_id_fk" btree (port_id)
    "packages_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID

freshports.dev=# 

That data looks like this:

freshports.dev=# SELECT * FROM packages ORDER BY package_name LIMIT 5;
  id   | abi_id | port_id |  version  |   package_name   
-------+--------+---------+-----------+------------------
 63781 |      1 |   31186 | 0.0.23b_9 | 0ad
 63000 |      1 |   41577 | 2.3       | 0d1n
 62996 |      1 |    5241 | 0.16_2    | 0verkill
 62990 |      1 |   45139 | 0.9.3     | 1password-client
 62982 |      1 |   35082 | 0.9.1     | 2048
(5 rows)

freshports.dev=# 

Parsing the raw data into relational data

How does data get from packages_raw into packages?

Via this query:

freshports.dev=# EXPLAIN ANALYSE
freshports.dev-# INSERT INTO packages(abi_id, port_id, package_name, version)
freshports.dev-# SELECT abi_id,
freshports.dev-#        GetPort(package_origin),
freshports.dev-#        package_name,
freshports.dev-#        package_version
freshports.dev-#   FROM packages_raw
freshports.dev-#  WHERE GetPort(package_origin) IS NOT NULL;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Insert on packages  (cost=0.00..17084.83 rows=31334 width=45) (actual time=34310.269..34310.269 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..17084.83 rows=31334 width=45) (actual time=13.996..33835.542 rows=31473 loops=1)
         ->  Seq Scan on packages_raw  (cost=0.00..16693.16 rows=31334 width=29) (actual time=13.957..33788.113 rows=31473 loops=1)
               Filter: (getport(package_origin) IS NOT NULL)
               Rows Removed by Filter: 18
 Planning Time: 0.271 ms
 Trigger for constraint packages_abi_id_fk: time=354.821 calls=31473
 Trigger for constraint packages_port_id_fk: time=2243.644 calls=31473
 Execution Time: 36916.031 ms
(9 rows)

freshports.dev=# 

Some explaining:

  • GetPort(package_origin) converts package_origin (e.g. games/0ad) into a port id – if there is no matching port, that’s something to deal with later. Why no matching port? In my dev database, it can easily occur.
  • There might be a better way to do this, but so far so good. Suggestions welcome.

What did not get converted?

This query helps me find out what data is not being copied from the packages_raw table to the packages table:

freshports.dev=# SELECT PR.* 
  FROM packages_raw PR
 WHERE NOT EXISTS (SELECT * FROM packages P where PR.package_name = P.package_name);
   id   | abi_id |            package_origin            |          package_name          | package_version 
--------+--------+--------------------------------------+--------------------------------+-----------------
  96128 |      1 | sysutils/swapusage                   | swapusage                      | 1.0.0
 100602 |      1 | devel/p5-MooX-Singleton              | p5-MooX-Singleton              | 1.20
 101008 |      1 | devel/py-aspy.yaml                   | py37-aspy.yaml                 | 1.3.0
 101036 |      1 | devel/py-nodeenv                     | py37-nodeenv                   | 1.3.3
 101891 |      1 | devel/rubygem-peek-rblineprof-rails5 | rubygem-peek-rblineprof-rails5 | 0.2.0
 101953 |      1 | devel/rubygem-peek-gc-rails5         | rubygem-peek-gc-rails5         | 0.0.2
 103834 |      1 | devel/py-nodeenv                     | py27-nodeenv                   | 1.3.3
 104220 |      1 | sysutils/py-leviathan                | py37-leviathan                 | 0.1.1
 107173 |      1 | sysutils/atf-rk3399                  | atf-rk3399                     | v2.1
 117527 |      1 | security/picosha2                    | picosha2                       | 1.0.0.23
  99899 |      1 | devel/p5-File-ShareDir-Tarball       | p5-File-ShareDir-Tarball       | 0.2.2
 100063 |      1 | devel/p5-MooseX-MungeHas             | p5-MooseX-MungeHas             | 0.011
 100562 |      1 | devel/py-aspy.yaml                   | py27-aspy.yaml                 | 1.3.0
 101881 |      1 | devel/rubygem-peek-redis-rails5      | rubygem-peek-redis-rails5      | 1.2.0
 101917 |      1 | devel/rubygem-peek-rails5            | rubygem-peek-rails5            | 1.0.1
 101941 |      1 | databases/rubygem-peek-mysql2-rails5 | rubygem-peek-mysql2-rails5     | 1.2.0
 106471 |      1 | sysutils/py-leviathan                | py27-leviathan                 | 0.1.1
 107871 |      1 | www/rubygem-gon-rails5               | rubygem-gon-rails5             | 6.2.1
(18 rows)

freshports.dev=# 

That query runs quickly, very quickly. There are indexes on package_name in both tables.

What’s next?

Next up is the display of this data on each port page. That’s where port_id comes into play.

freshports.dev-#        PKG.package_name,
freshports.dev-#        PKG.version
freshports.dev-#   FROM packages PKG, ABI
freshports.dev-#  WHERE PKG.port_id = 31186
freshports.dev-#    AND PKG.abi_id  = ABI.id;
       name       | package_name |  version  
------------------+--------------+-----------
 FreeBSD:12:amd64 | 0ad          | 0.0.23b_9
(1 row)

freshports.dev=# 

Hmmm, I should rename packages.version to package.package_version for consistency with packages_raw.

packages_raw is designed to hold multiple entries for a given port, for the different ABI (application binary interface). Have a look at https://pkg.freebsd.org to see the various packages sets.

Example screen shot from security/acme.sh

acme.sh versions

acme.sh versions

Mar 292020
 

When I started this, I thought it was be straight forward and take only a few hours at most. It turned into multiple hours over two days.

This is part two of a two-part post. The first part dealt with changing a database column from time to timestamp.

In this post:

  1. PostgreSQL 12.2
  2. Nagios 3.5
  3. FreeBSD 12.1

The queries

The two queries I want to run are:

SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
SELECT count(*) FROM cache_clearing_ports WHERE date_added > now() - interval '1 hour';

The script

Combining those two into a script, I came up with this, after a bit of trial and error:

#!/bin/sh

# This script checks the cache_clearing_ports and cache_clearing_dates in
# a freshports database

# arguments
# * server   - IP address or hostname of the PostgreSQL database server
# * database - database name
# * user     - database user
#
# port is always the default value: 5432
#
# The password must be in ~.pgpass for the Unix user which runs this script.
#
# For Nagios, that defaults to ~nagios/.pgpass
#
# See https://www.postgresql.org/docs/12/libpq-pgpass.html
#

#
# arguments used to create simple output of a single value
# and to never issue a password prompt
#
psql="/usr/local/bin/psql --no-align --tuples-only --no-password "

server=$1
database=$2
user=$3

# we return zero, all good, by default
result=0

# the two database queries we will run
query_dates="SELECT count(*) FROM cache_clearing_dates WHERE date_added < now() - interval '1 hour';"
query_ports="SELECT count(*) FROM cache_clearing_ports WHERE date_added < now() - interval '1 hour';"

# how is the dates table?

dates=`$psql -c "$query_ports" --host=$server $database $user`
if [ $? == 0 ] ; then
# uncomment for debugging
#  echo all ok with dates query
else
  # when we hit an error, we do not care about the rest of the checks
  echo error on dates query:$? $dates
  exit 2
fi

# how is the ports table?

ports=`$psql -c "$query_dates" --host=$server $database $user`
if [ $? == 0 ] ; then
# uncomment for debugging
#  echo all ok with ports query
else
  # when we hit an error, we don't care about the rest of the checks
  echo error on ports query: $? $ports
  exit 2
fi

# check the results

if [ "$ports" != "0" ] ; then
  echo There are $ports ports entries in the cache
  result=2
fi

if [ "$dates" != "0" ] ; then
  echo There are $dates dates entries in the cache
  result=2
fi

if [ $result == 0 ] ; then
  echo All OK
fi

exit $result

How is this script invoked?

I run this script via net-mgmt/nrpe3. In general, I run that tool on every host and jail. The line within the configuration file looks like this:

$ grep check_freshports_cache_clearing /usr/local/etc/nrpe.cfg
command[check_freshports_cache_clearing]=/usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios

I store my personal Nagios scripts in a non-default location: /usr/local/libexec/nagios-custom

Usually, Nagios scripts are in /usr/local/libexec/nagios

Why? Mostly so I can keep them all in one place, and there is never any chance of file name collision.

Script parameters

Please note the following about these parameters:

  • very simple
  • rather inflexible
  • they all must be provided
  • they must appear in this order

The parameters are:

  1. pg03.int.example.org : server – IP address or hostname of the PostgreSQL database server
  2. freshports.dev : database – database name
  3. nagios : database user

Do not confuse the nagios database user with the nagios unix user. They are two separate things.

The nagios user above is part of the PostgreSQL database system. I will show you how I created that later.

The nagios unix user is created by the Nagios port. Installing Nagios or nrpe is outside the scope of this post.

Password

I am a big fan of never putting sensitive information on the command line, such as passwords. Even if nobody else is ever on your system, don’t do it.

The solution here is the .pgpass as documented by the PostgreSQL project.

The script is run by nrpe which runs as the nagios user.

$ ps auwwx | grep nrpe
nagios     85203  0.0  0.0 15812  6368  -  IsJ  01:54   0:01.48 /usr/local/sbin/nrpe3 -c /usr/local/etc/nrpe.cfg -d
dan        15654  0.0  0.0 11324  2648  3  S+J  18:10   0:00.00 grep nrpe

It is because the script runs as the nagios user that the .pgpass files resides at ~nagios/.pgpass. It must be chown nagios and chmod 0600. My file looks like this, with contents slightly modified for public display:

$ sudo cat ~nagios/.pgpass
# hostname:port:database:username:password
pg03.int.example.org:5432:freshports.dev:nagios:notmypassword

The file looks like this:

$ ls -l  ~nagios/.pgpass
-rw-------  1 nagios  nagios  111 Mar 28 19:30 /var/spool/nagios/.pgpass

Why /var/spool/nagios?

Because of this:

$ grep nagios /etc/passwd 
nagios:*:181:181:Nagios pseudo-user:/var/spool/nagios:/usr/sbin/nologin

That is why. That is the home directory for the nagios user as defined in the UIDs file of the FreeBSD ports tree.

Creating the database user & permissions

These are the commands I used to create the nagios database user and the permissions required to query the tables.

First, I create a new group, or role. The group has SELECT rights on the table.

CREATE ROLE reporting;
GRANT SELECT ON cache_clearing_dates TO GROUP reporting;
GRANT SELECT ON cache_clearing_ports TO GROUP reporting;

Here, I create the nagios database user and add them to the role I just created.

CREATE ROLE nagios WITH LOGIN PASSWORD 'notmypassword';
GRANT reporting TO nagios;

Of note:

  • WITH LOGIN – allows this user to login. A role having the LOGIN attribute can be thought of as a user.
  • PASSWORD ‘notmypassword’ – specifies the password for this user. It also appears in the .pgpassfile for the user which will be running this script.

Testing the script

When testing the script from the command line, I did it this way:

[dan@dev-ingress01:~] $ /usr/local/libexec/nagios-custom/check_freshports_cache_clearing pg03.int.example.org freshports.dev nagios
All OK

I ran this script as my regular user and I also created a ~/.pgpass file for myself, which looks just like the one for the nagios unix user.

Things to check

If you can’t get this to work, the things to check are:

  • pg_hba.conf file for the PostgreSQL database server
  • permissions on the database tables
  • permissions on the .pgpass files in question
  • make sure it runs fine from the command line before trying nrpe

Monitoring is key

Monitoring is key to early detection of problems. When something should always be in a given state, it should be monitored to ensure it is in that state. Having a monitoring system which will provide early detection of anomalies helps me fix problems early, rather than later.

I try to monitor everything. Connections to expected hosts. Services running. Plenty of disk space.

If you’re not monitoring yet, start slowly and build it up. That time will be well spent.

Hope this helps.

Mar 292020
 

The plan for today was not decided until after I got out of bed. I sat down with coffee and looked through the list of tasks. I found one which appealed: monitor the cache_clearing_ports & cache_clearing_dates tables

This is part one of a two-part story. The first part deals with fixing a database table before doing the planned work in part two, a Nagios check.

In this post:

  1. PostgreSQL 12.2
  2. Nagios 3.5
  3. FreeBSD 12.1

FreshPorts stores a lot of data in the database, but caches pages on disk. Specific events clear that cache, and the two tables listed will contain lists of items to remove from the cache.

Disk? Cache?

FreshPorts was built long before modern cache tools existed. We built our own. They work great for what is needed. The server uses ZFS and anything frequently used from disk is also cached in memory (see ZFS ARC). A page takes longer to load the first time it is built, but afterwards, it is fast enough.

The checking query

Sometimes the cache clearing lags. Sometimes there is a bug in the code. For whatever reasons behind the lag, I want to be notified if there are entries in this table which are more than 60 minutes old.

Here is the query I started with:

freshports.dev=# SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
ERROR:  operator does not exist: time without time zone > timestamp with time zone
LINE 1: ...unt(*) FROM cache_clearing_dates WHERE date_added > now() - ...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

What? Why doesn’t what work? Look at the caret above? The comparison operator is the issue. I can’t compae those two values.

Which two values?

A time without time zone and a timestamp with time zone.

Where do I have a time field? Let’s check the table.

freshports.dev=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.dev=# 

Oh. The date_added column should be a timestamp without time zone. This table was created incorrectly. I checked test, stage, and prod. All the same.

The cache_clearing_ports table is correct:

freshports.org=# \d public.cache_clearing_ports
                                        Table "public.cache_clearing_ports"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

Fixing cache_clearing_dates

This is the table structure I started with:

freshports.dev=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.dev=#

Ummm, shouldn’t date_added be timestamp without time zone?

I’m sure that field does not have a date value in it. It should.

The tests

I wanted verification this was wrong. I was sure, but I wanted to be positively sure.

The better way to do this, and I thought of this after I started this blog post, is:

freshports.dev=# SELECT now()::time without time zone;
       now       
-----------------
 15:07:35.428659
(1 row)

freshports.dev=# 

To compose this test, take the Default value from line 7, and cast it use the Type on the same line.

What I really want is this:

freshports.dev=# SELECT now()::timestamp without time zone;
            now             
----------------------------
 2020-03-28 15:13:55.946128
(1 row)

freshports.dev=# 

Yes, that’s better. I want to know how long this entry has been in the table. A date is required for that.

Don’t try this at home

What did I try first? I don’t recommend this, but it worked for me.

freshports.dev=# INSERT INTO public.cache_clearing_dates (date_to_clear) VALUES ('2020-01-01');
INSERT 0 1
freshports.dev=# select * from cache_clearing_dates;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 162763 | 2020-01-01    | 14:51:33.617565
(1 row)

freshports.dev=# rollback;

Yeah, that column has to change.

The change

PLEASE NOTE that in my specific case, this table has no data. There is no existing data to convert. I will show you one way to do that, but if you have data to convert, you may need to find answers elsewhere.

Here is a first attempt. Notice the transaction. If you have a lot of data in there, please be aware of how this might be blocking your applications. Mind you, if you have such a table, and applications, I would think you would already be aware of those issues.

First failure

This should be easy. ALTER TABLE…

freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone;
ERROR:  column "date_added" cannot be cast automatically to type timestamp without time zone
HINT:  You might need to specify "USING date_added::timestamp without time zone".
freshports.dev=# rollback;
ROLLBACK

That’s new to me. Let’s read up on that a bit.. Oh yes, this is HOW PostgreSQL will convert your existing data.

Overwriting the column

THE FOLLOWING WILL LOSE DATA. What I am doing in the following command will set all rows to now() and existing values will be lost.

freshports.dev=# begin;
BEGIN
freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone USING now();
ALTER TABLE
freshports.dev=# \d cache_clearing_dates
                                          Table "public.cache_clearing_dates"
    Column     |            Type             | Collation | Nullable |                     Default                      
---------------+-----------------------------+-----------+----------+--------------------------------------------------
 id            | integer                     |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                        |           |          | 
 date_added    | timestamp without time zone |           |          | now()

freshports.dev=# rollback;
ROLLBACK
freshports.dev=# 

Prepending a fixed date value

This attempt will prepend the current date to all values in the table.

freshports.dev=# begin;
BEGIN
freshports.dev=# -- put some test data into my table
freshports.dev=# INSERT INTO public.cache_clearing_dates (date_to_clear) VALUES ('2020-01-01');
INSERT 0 1
freshports.dev=# -- test out the prepend code
freshports.dev=# select now()::date + date_added from public.cache_clearing_dates;
          ?column?          
----------------------------
 2020-03-28 15:27:22.119605
(1 row)

freshports.dev=# -- alter the table
freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone USING now()::date + date_added;
ALTER TABLE
freshports.dev=# SELECT * FROM public.cache_clearing_dates;
   id   | date_to_clear |         date_added         
--------+---------------+----------------------------
 162765 | 2020-01-01    | 2020-03-28 15:27:22.119605
(1 row)

freshports.dev=# -- undo my changes
freshports.dev=# rollback;
ROLLBACK

The change I used

This is the command I used to update my table:

freshports.dev=# ALTER TABLE cache_clearing_dates ALTER COLUMN date_added SET DATA TYPE timestamp without time zone USING now()::date + date_added;
ALTER TABLE

This is the test I used to verify I was getting what I wanted:

freshports.dev=# begin; insert into public.cache_clearing_dates (date_to_clear) values ('2020-01-01');
BEGIN
INSERT 0 1
freshports.dev=# SELECT * FROM public.cache_clearing_dates;
   id   | date_to_clear |         date_added         
--------+---------------+----------------------------
 162766 | 2020-01-01    | 2020-03-28 15:30:22.439249
(1 row)

freshports.dev=# rollback;
ROLLBACK
freshports.dev=# 

The queries for the Nagios check

The queries are straight forward. There should be zero entries more than 60 minutes old:

freshports.dev=# SELECT count(*) FROM cache_clearing_dates WHERE date_added > now() - interval '1 hour';
 count 
-------
     0
(1 row)

freshports.dev=# SELECT count(*) FROM cache_clearing_ports WHERE date_added > now() - interval '1 hour';
 count 
-------
     0
(1 row)

freshports.dev=# 

I combined those two into a script, which appears in part two of this post.

Jan 262020
 

FreshPorts had a cache clearing issue recently. It lingered. For a long time. It took me a while to figure it out. It turned out to be a Python 3 compatibility issue.

In this post:

  • PostgreSQL 12.1
  • FreeBSD 12.1

I will outline how database changes invoke external scripts asynchronously which then clear the cache. I will go into the debugging of the issue and how the bug was fixed.

How caching works

FreshPorts caches data on disk. When you go to visit a page, the website checks the cache on disk, and if not found, it runs the queries, creates the HTML then saves it to disk.

Really? You cache to disk? Why?

Q: Why don’t you using memcached or some other caching tool?

A: FreshPorts was created long before such fancy tools.

The database side of caching

FreshPorts uses PostgreSQL, an amazing database which just happens to be open source and freely available. The initial work was done on a lesser database system, but then I was introduced to PostgreSQL in July 2000, which had grown-up features I expected, such as stored procedures and relational integrity. I never looked back.

Cache clearing in FreshPorts relies upon asynchronous notification using the LISTEN and NOTIFY commands.

A small python script (about 220 lines) issues several LISTEN commands, then starts polling.

This is the connection:

import configparser

config = configparser.ConfigParser()
config.read('/usr/local/etc/freshports/fp-listen.ini')

DSN = 'host=' + config['database']['HOST'] + ' dbname=' + config['database']['DBNAME'] + ' user=' + config['database']['DBUSER'] + ' password=' + config['database']['PASSWORD']

conn = psycopg2.connect(DSN)
curs = conn.cursor()

Next, we read from the database a list of LISTEN channels and then issue a LISTEN command for that channel

curs.execute("SELECT name, script_name FROM listen_for ORDER BY id")
listens_for = curs.fetchall()

listens = dict()
print ("These are the (event name, script name) pairs we are ready for:")
for listen in listens_for:
  curs.execute("LISTEN %s" % listen[0])
  listens[listen[0]] = listen[1]
  print ("('%s', '%s')" % (listen[0], listen[1]))

In this case, name is a channel and script_name is what the Python script will run when it encounters a NOTIFY for that channel.

This is the main loop of the code:

while 1:
  if select.select([conn],[],[],5)==([],[],[]):
    syslog.syslog(syslog.LOG_NOTICE, 'timeout! *************')
  else:
    conn.poll()
    syslog.syslog(syslog.LOG_NOTICE, 'Just woke up! *************')
    while conn.notifies:
      notify = conn.notifies.pop(0);
      # in real life, do something with each...
      syslog.syslog(syslog.LOG_NOTICE, "Got NOTIFY: %d, %s, %s" % (notify.pid, notify.channel, notify.payload));
      if notify.channel in listens:
        syslog.syslog(syslog.LOG_NOTICE, "found key %s" % (notify.channel));
        clear_cache = True;
        if listens[notify.channel]   == 'listen_port':
          syslog.syslog(syslog.LOG_NOTICE, "invoking RemoveCacheEntry()");
          RemoveCacheEntry()
        elif listens[notify.channel] == 'listen_date_updated':
          syslog.syslog(syslog.LOG_NOTICE, "invoking ClearDateCacheEntries()");
          ClearDateCacheEntries()
        ...
        ...
        else:
          clear_cache = False;
          syslog.syslog(syslog.LOG_ERR, "Code does not know what to do when '%s' is found." % notify.channel)

        if clear_cache:
          ClearMiscCaches()

      else:
        syslog.syslog(syslog.LOG_NOTICE, 'no such key in listens array for %s!' % (notify.channel))

This isn’t great code, patches & suggestions welcomed.

I remember this worked differently in earlier versions of PostgreSQL, but this approach has been in use for several years now. The poll command times out occasionally, and when it does, we just loop again.

When we receive a NOTIFY, we verify it is in the list of what we LISTEN for, then go through the list and invoke the appropriate script.

The database side

Triggers will invoke a NOTIFY when and where relevant. An update on the commit_log table will add an entry to the cache_clearing_dates table and invoke NOTIFY date_updated.

How cache clearing works

With the database issuing the NOTIFY, the script receives the NOTIFY date_updated. It will then query the cache_clearing_dates table to see what dates need to be cleared. On disk this looks like this:

[dan@dev-nginx01:/var/service/fp-listen/fp-listen] $ sudo ls -l /var/db/freshports/cache/daily/2020/01
total 340
-rw-rw-r--  1 www  freshports  143080 Jan  9 23:26 08.daily.head
-rw-rw-r--  1 www  freshports  107017 Jan 11 04:57 10.daily.head
-rw-rw-r--  1 www  freshports  152478 Jan 21 14:26 17.daily.head
-rw-rw-r--  1 www  freshports  129303 Jan 25 18:12 23.daily.head

Thus, the code just deletes the given file pertaining to the date in question.

Finding the issue

Debugging the caching was challenging in that the first step was to reproduce the issue.

I started the small Python app on my dev FreshPorts jail:

[root@dev-nginx01:/var/service/fp-listen] # echo ./fp-listen/fp-listen.py | su -fm freshports
These are the (event name, script name) pairs we are ready for:
('port_updated', 'listen_port')
('ports_moved', 'listen_ports_moved')
('ports_updating', 'listen_ports_updating')
('vuxml', 'listen_vuxml')
('category_new', 'listen_category_new')
('date_updated', 'listen_date_updated')

The output you see is what the app has read from the listen_for table. The events are data-driven. See lines 14-18 above.

freshports.dev=# SELECT * FROM listen_for;
 id |      name      |      script_name      
----+----------------+-----------------------
  5 | port_updated   | listen_port
  6 | ports_moved    | listen_ports_moved
  7 | ports_updating | listen_ports_updating
  8 | vuxml          | listen_vuxml
 10 | category_new   | listen_category_new
 11 | date_updated   | listen_date_updated
(6 rows)

freshports.dev=# 

Now that the code is running, in a separate ssh session, I connect to the database and manually issue a NOTIFY command:

freshports.dev=# NOTIFY date_updated;
NOTIFY
freshports.dev=# 

BOOM, the Python application dies in a heap:

Traceback (most recent call last):
  File "./fp-listen/fp-listen.py", line 190, in 
    if listens.has_key(notify.channel):
AttributeError: 'dict' object has no attribute 'has_key'
[root@dev-nginx01:/var/service/fp-listen] #

NOTE:

  1. the code you see above has already been patched
  2. line 190 in the code is actually line 29 as shown above

The fix

The fix was:

-      if listens.has_key(notify.channel):
+      if notify.channel in listens:

This fix is seen online 8 above and is a known and documented Python 2 – > Python 3 migration issue

Look at all the backed up queues!

When fp-listen gets a NOTIFY for ports or dates which need clearing, that data can contain a payload, but this must be shorter than 8000 bytes. As mentioned above, we store this data in tables instead of sending it directly to the Python app.

Looking in the tables:

[dan@x8dtu-pg02:~] $ ': psql freshports.org
Password for user dan: 
psql (12.1)
Type "help" for help.

freshports.org=# select count(*) from cache_clearing_ports;
 count 
-------
 20508
(1 row)

freshports.org=# select count(*) from cache_clearing_dates;
 count 
-------
   290
(1 row)

freshports.org=# 

So how long has this been an issue:

freshports.org=# select * from cache_clearing_dates order by date_added asc limit 5;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 172300 | 2020-01-22    | 00:08:21.443837
 172301 | 2020-01-22    | 00:12:37.328254
 172302 | 2020-01-22    | 00:30:32.244191
 172303 | 2020-01-22    | 00:41:27.386203
 172304 | 2020-01-22    | 00:43:57.517707
(5 rows)

freshports.org=# select * from cache_clearing_dates order by date_added desc limit 5;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 172299 | 2020-01-21    | 23:58:53.74748
 172298 | 2020-01-21    | 23:49:24.205331
 172188 | 2020-01-20    | 23:20:03.807726
 172297 | 2020-01-21    | 23:16:04.518664
 172187 | 2020-01-20    | 23:06:55.273125
(5 rows)

freshports.org=# 

Oh, it seems the cache_clearing_dates.date_added be a timestamp. I’ll fix that issue later.

Let’s try the ports cache queue:

freshports.org=# select * from cache_clearing_ports order by date_added asc limit 5;
    id    | port_id | category |     port      |         date_added         
----------+---------+----------+---------------+----------------------------
 16669514 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
 16669515 |   16343 | devel    | py-setuptools | 2020-01-20 17:23:04.747768
 16669516 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
 16669517 |   49106 | lang     | python37      | 2020-01-20 17:23:04.747768
 16669513 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
(5 rows)

freshports.org=# select * from cache_clearing_ports order by date_added desc limit 5;
    id    | port_id | category |       port       |         date_added         
----------+---------+----------+------------------+----------------------------
 16691617 |   37057 | print    | R-cran-knitr     | 2020-01-22 15:37:13.407464
 16691618 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691615 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691616 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691619 |   37051 | textproc | R-cran-yaml      | 2020-01-22 15:37:13.407464
(5 rows)

freshports.org=# 

From this, the problem arose on January 20th.

FYI, the tables look like this:

freshports.org=# \d cache_clearing_ports 
                                        Table "public.cache_clearing_ports"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.org=# 

Releasing the code

After I fixed the code, packaged it, and installed it on test, I encountered this error:

Jan 22 17:28:32 test-nginx01 fp-listen[62850]: ClearMiscCaches() is clearing '/var/db/freshports/cache/new/*'

Eh? That’s the wrong directory. It should be news not new. Here’s the code in question

def ClearMiscCaches():
  syslog.syslog(syslog.LOG_NOTICE, 'invoked: ClearMiscCaches()');
  filenameglob = config['dirs']['NEWS_CACHE_DIR'];
  syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is clearing %s' % (filenameglob));
  for filename in glob.glob(filenameglob):
    syslog.syslog(syslog.LOG_NOTICE, 'removing %s' % (filename))
    try:
      if os.path.isfile(filename):
        os.remove(filename)
      else:
        shutil.rmtree(filename)

    except (OSError, err):
      if err[0] == 2:
        pass  # no file to delete, so no worries

      else:
        syslog.syslog(syslog.LOG_CRIT, 'ERROR: error deleting cache entry.  Error message is %s' % (err))
        continue

Here is the entry from fp-listen.ini (see line 4):

NEWS_CACHE_DIR  = '%(BASEDIR)s/cache/new/*'

That should be:

NEWS_CACHE_DIR  = '%(BASEDIR)s/cache/news/*'

I had to make this fix on each server, other than dev. It was incorrect everywhere.

I restarted the service, which is controlled by daemontools:

sudo svc -t /var/service/fp-listen

I found the code on line 5 was not working as expected. No syslog entries were being recorded. glob.glob(filenameglob) was not enumerating everything. Tests in a small script worked. That same test script when run by the freshports user also yielded expected results. I could not locate a reason why this would fail in fp-listen.

New ClearMiscCaches() code

After reviewing the documentation for glob, I stumbled across Path and used it. I changed the ClearMiscCaches() code as shown below:

def ClearMiscCaches():
  syslog.syslog(syslog.LOG_NOTICE, 'invoked: ClearMiscCaches()');
  
  news_cache_dir = config['dirs']['NEWS_CACHE_DIR'];
  syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is clearing out entries in %s' % (news_cache_dir));

  for filename in Path(news_cache_dir).iterdir():
    syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is removing %s' % (filename))
    try:
      if Path(filename).is_file():
        Path(filename).unlink()
      else:
        shutil.rmtree(filename)

The code on line 5 is the key. We are iterating over a path to get the files we need. dotfiles will be overlooked, which is sufficient for our needs. We do not use dotfiles (e.g. .Caching_Not_Done_Like_This).

It was very strange, because glob.glob(filenameglob) worked in other parts of the fp-listen code, but not in this directory.

New package

With that code change, the new package was created and made it way through test, stage, and beta servers before being moved into production.

The cause of the problem seems to be the migration from Python 2 to Python 3. I would like to more testing on the glob issue, but I have moved onto other issues.

Why LISTEN/NOTIFY?

LISTEN/NOTIFY is a valuable tool when your database needs to invoke external events. Let the transaction[s] complete and then the external code will eventually do your bidding via the LISTEN portion of the event.

Hopefully, this will open up a whole new avenue of solutions for you.

Jan 252020
 

I’ve noticed a log of sanity test failures related to devel/binutils lately. Every single commit.

Issue 133 deals with that. Here, I plan to document what happened.

FreshPorts has, since inception, worked off the commit emails. The incoming email is parsed and XML created. The XML is then used to update the database.

I started looking in the message archives. I confirmed that the original email indicated deletes were occurring.

Yet, the XML had no deletes:

[dan@dev-ingress01:/var/db/freshports/message-queues/archive/2019_11/2019_11_15] $ cat 2019.11.15.10.09.16.85661.txt.xml 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE UPDATES PUBLIC "-//FreshPorts//DTD FreshPorts 2.0//EN" "https://www.freshports.org/docs/fp-updates.dtd">

<UPDATES Version="1.3.2.1">
    <UPDATE>
        <DATE Month="11" Day="15" Year="2019"></DATE>
        <TIME Timezone="UTC" Minute="9" Second="3" Hour="10"></TIME>
        <OS Repo="ports" Branch="head" Id="FreeBSD"></OS>
        <LOG>Update binutils to 2.33.1

While here, convert binutils into flavors, that ease a lot maintenance
and helps cleaning out the code.

This is inspired by the same work on going on the xtoolchained gcc by jhb@

PR:		241756
Exp-run:	antoine (portmgr)
Discussed with:	jhb
Differential Revision:	https://reviews.freebsd.org/D22258</LOG>
        <PEOPLE>
            <UPDATER Handle="bapt"></UPDATER>
        </PEOPLE>
        <MESSAGE Id="201911151009.xAFA9393081239@repo.freebsd.org" EncodingLosses="false" Subject="svn commit: r517642 - in head: . devel devel/aarch64-binutils devel/aarch64-none-elf-binutils devel/aarch64-none-elf-gcc devel/amd64-binutils devel/arduino-tools devel/arm-gnueabi-binutils devel/ar...">
            <DATE Year="2019" Day="15" Month="11"></DATE>
            <TIME Second="3" Minute="9" Timezone="UTC" Hour="10"></TIME>
            <REPOSITORY>ports</REPOSITORY>
            <REVISION>517642</REVISION>
            <TO Email="ports-committers@freebsd.org"></TO>
            <TO Email="svn-ports-all@freebsd.org,"></TO>
        </MESSAGE>
        <FILES>
            <FILE Action="Add" Path="head/devel/binutils/pkg-plist-aarch64" Revision="517642"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-amd64" Revision="517642" Action="Add"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-arm-gnueabi"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-arm-none-eabi"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-avr" Revision="517642" Action="Add"></FILE>
            <FILE Action="Add" Path="head/devel/binutils/pkg-plist-i386" Revision="517642"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-mingw32"></FILE>
            <FILE Revision="517642" Path="head/devel/binutils/pkg-plist-mips" Action="Add"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-mips64"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-powerpc64" Revision="517642" Action="Add"></FILE>
            <FILE Action="Add" Path="head/devel/binutils/pkg-plist-riscv64" Revision="517642"></FILE>
            <FILE Revision="517642" Path="head/devel/binutils/pkg-plist-s390x" Action="Add"></FILE>
            <FILE Revision="517642" Path="head/devel/binutils/pkg-plist-sparc64" Action="Add"></FILE>
        </FILES>
        <PR Id="241756"></PR>
    </UPDATE>
</UPDATES>
[dan@dev-ingress01:/var/db/freshports/message-queues/archive/2019_11/2019_11_15] $ 

Test the code

Testing the code, I didn’t get anything different out:

[freshports@dev-ingress01 /usr/local/libexec/freshports]$ cat /var/db/freshports/message-queues/archive/2019_11/2019_11_15/2019.11.15.10.09.16.85661.txt.raw | ./process_mail.pl 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE UPDATES PUBLIC "-//FreshPorts//DTD FreshPorts 2.0//EN" "https://www.freshports.org/docs/fp-updates.dtd">

<UPDATES Version="1.3.2.1">
    <UPDATE>
        <DATE Year="2019" Month="11" Day="15"></DATE>
        <TIME Second="3" Minute="9" Hour="10" Timezone="UTC"></TIME>
        <OS Id="FreeBSD" Branch="head" Repo="ports"></OS>
        <LOG>Update binutils to 2.33.1

While here, convert binutils into flavors, that ease a lot maintenance
and helps cleaning out the code.

This is inspired by the same work on going on the xtoolchained gcc by jhb@

PR:		241756
Exp-run:	antoine (portmgr)
Discussed with:	jhb
Differential Revision:	https://reviews.freebsd.org/D22258</LOG>
        <PEOPLE>
            <UPDATER Handle="bapt"></UPDATER>
        </PEOPLE>
        <MESSAGE EncodingLosses="false" Id="201911151009.xAFA9393081239@repo.freebsd.org" Subject="svn commit: r517642 - in head: . devel devel/aarch64-binutils devel/aarch64-none-elf-binutils devel/aarch64-none-elf-gcc devel/amd64-binutils devel/arduino-tools devel/arm-gnueabi-binutils devel/ar...">
            <DATE Month="11" Year="2019" Day="15"></DATE>
            <TIME Hour="10" Second="3" Minute="9" Timezone="UTC"></TIME>
            <REPOSITORY>ports</REPOSITORY>
            <REVISION>517642</REVISION>
            <TO Email="ports-committers@freebsd.org"></TO>
            <TO Email="svn-ports-all@freebsd.org,"></TO>
        </MESSAGE>
        <FILES>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-aarch64"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-amd64" Revision="517642" Action="Add"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-arm-gnueabi" Revision="517642" Action="Add"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-arm-none-eabi"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-avr"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-i386"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-mingw32"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-mips"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-mips64"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-powerpc64"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-riscv64" Action="Add" Revision="517642"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-s390x"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-sparc64" Revision="517642" Action="Add"></FILE>
        </FILES>
        <PR Id="241756"></PR>
    </UPDATE>
</UPDATES>
[freshports@dev-ingress01 /usr/local/libexec/freshports]$ 

It must be the code, or a change in email layout.

The above includes only Add actions, which directly relate to the Added: entries in the email.

Finding the issue

I found the issue. This was the fix:

-		last if($line =~ /^Directory Properties:/);
+		next if($line =~ /^Directory Properties:/);

In the existing code, when a line containing only Directory Properties: was encountered, that signaled the end of processing for this commit.

I conclude that the email layout changed.

This meant that the deletes were never being processed, given the location of this text within the original email.

With the above code, I ran the script again and dumped the XML into a file. I originally wanted to show you a diff, but by its nature XML is not always the same each time it is generated. For example, the fields within a group do not necessarily appear in the same order. As I was typing this, I thought there MUST be an online XML diff tool. I found https://www.corefiling.com/opensource/xmldiff/ and created this for your viewing pleasure:

+       <FILE Action='Add' Path='head/devel/binutils/pkg-plist-aarch64-none-elf' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/aarch64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/aarch64-none-elf-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/amd64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/arm-gnueabi-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/arm-none-eabi-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/avr-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/i386-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/mingw32-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/mips-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/mips64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/powerpc64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/riscv64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/s390x-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/sparc64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/MOVED' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/aarch64-none-elf-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/arduino-tools/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/arm-none-eabi-gcc492/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/arm-none-eabi-newlib/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/avr-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/avr-libc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/binutils/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/binutils/distinfo' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/mingw32-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/powerpc64-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/riscv64-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/sope4/Makefile' Revision='517642'></FILE>

That is a lot of missing data.

Running the new XML

After running this:

[freshports@dev-ingress01 /usr/local/libexec/freshports]$ ./load_xml_into_db.pl  /tmp/2019.11.15.10.09.16.85661-new.txt.xml -O > /tmp/2019.11.15.10.09.16.85661-new.txt.loading
Use of uninitialized value $depends in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1363.
Use of uninitialized value $depends in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1364.
Use of uninitialized value $depends in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1363.
Use of uninitialized value $depends in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1364.
Use of uninitialized value $depends in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1363.
Use of uninitialized value $depends in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1364.
Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/ports_generate_plist.pm line 40.
Use of uninitialized value in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/ports_generate_plist.pm line 50.
Use of uninitialized value in split at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1296.
[freshports@dev-ingress01 /usr/local/libexec/freshports]$ 

devel/s390x-binutils was marked as deleted. Score!

I’ll fix up those other errors next, but they are beyond the scope of this post.

Not shown here: I went back and re-processed each subsequent commit on devel/binutils

Thanks for coming to my TED talk.