Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Mar 302021
 

Hello,

The changeover from subversion to git for the FreeBSD project enters it’s final stage about three hours from now (as I type this). At 3:01 UTC on Mar 31 2021, the repository will become read-only. The full schedule for that process is in the FreeBSD wiki.

The first git commit is not expected until Apr 3rd.

That gives FreshPorts a couple of days to convert from subversion to git. The coding has been completed. The database needs to be modified, the new code installed, and configuration updates.

This change to FreshPorts represents the final stage of work which started over 13 months ago.

This process has had a few dry runs and it takes a few hours. Given the 3 day timeframe, the FreshPorts plans is laid out below.

  1. Drop the TTL because we will change to a new server.
  2. Disable user logins & disable account updates – a backup will be taken and any subsequent updates to production would not appear in the new database. About 12:30 UTC on 31 March
  3. Production will continue to serve up content during the updates.
  4. The new git-enabled website should be ready by about 20:00 UTC on 31 March.

The goals include:

  1. Apart from logins-disabled, you should not notice any change in service during the transition
  2. The plan is to reproduce the website on an AWS instance – this will be our first venture into that venue
  3. Service will gradually moved to the new website by use of multiple A and AAAA records.

Please watch our Twitter account and status page for updates.

Mar 092021
 

I had noticed a problem with my full text search solution when I was testing. If the input query contained a quote, an error resulted on the query. At the time, I attributed that to a poor input/text handling. Later, I realized I was wrong.

On the website, this PHP code is used:

$result = pg_query_params($this->dbh, $sql, array($this->Query));

Where $this->Query is what the user supplied on the search form. The pg_query_params documentation says

The primary advantage of pg_query_params() over pg_query() is that parameter values may be separated from the query string, thus avoiding the need for tedious and error-prone quoting and escaping.

So I don’t need to do anything special.

Here is some testing I tried while talking this over via IRC:

freshports.dev=# SELECT to_tsquery('english', $$Joe's stuff$$);
ERROR:  syntax error in tsquery: "Joe's stuff"

freshports.dev=# SELECT to_tsquery('english', $$Joes stuff$$);
ERROR:  syntax error in tsquery: "Joes stuff"

freshports.dev=# SELECT to_tsquery('english', $$Joe stuff$$);
ERROR:  syntax error in tsquery: "Joe stuff"

freshports.dev=# SELECT to_tsquery('english', $$Joestuff$$);
 to_tsquery 
------------
 'joestuff'
(1 row)

It turns out the solution is to use websearch_to_tsquery() instead. The documentation mentions:

Moreover, this function should never raise syntax errors, which makes it possible to use raw user-supplied input for search.

That sounds ideal, and it was. It completely fixes the issue.

freshports.dev=# SELECT websearch_to_tsquery('english', $$Joe's stuff$$);
 websearch_to_tsquery 
----------------------
 'joe' & 'stuff'
(1 row)

freshports.dev=# SELECT websearch_to_tsquery('english', $$Joes stuff$$);
 websearch_to_tsquery 
----------------------
 'joe' & 'stuff'
(1 row)

freshports.dev=# SELECT websearch_to_tsquery('english', $$Joe stuff$$);
 websearch_to_tsquery 
----------------------
 'joe' & 'stuff'
(1 row)

Great. Hope this helps.

Mar 092021
 

In yesterdays blog post, Replacing a column search with a full text search solution, I wanted to mention how adding a LIMIT 10 to a fast running query made it run slower. I didn’t remember that issue until this morning. Fortunately, the fix was low cost.

The query

This is the query I ran:

EXPLAIN ANALYSE
WITH short_list AS (
    SELECT
        port_id
    FROM
        generate_plist
    WHERE
        textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh')
)
  select CL.commit_date - SystemTimeAdjust() AS last_commit_date, 
         P.id, 
         E.name as port,
         C.name as category, 
...
         P.uses  
       FROM short_list, ports P
       LEFT OUTER JOIN ports_vulnerable    PV  ON PV.port_id       = P.id
       LEFT OUTER JOIN commit_log          CL  ON P.last_commit_id = CL.id,
       element_pathname EP,
       categories C, element E 
 WHERE P.id = short_list.port_id
   AND P.element_id = EP.element_id and EP.pathname like '/ports/head/%'
   AND P.category_id  = C.id
   AND P.element_id   = E.id  AND E.status = 'A';

The plan (as found at https://explain.depesz.com/s/Bkkg) was fast enough for me.

Adding LIMIT 10

This was used on the FreshPorts search page (not in production, but on the dev website). It needs a limit, so I added LIMIT 10.

The plan changed to a sequential scan of a large table. Boom, 28 seconds. That is unsatisfactory. The plan is at https://explain.depesz.com/s/oJZB.

The Fix

The fix, as pointed out by Thomas Hurst in his reply to my tweet centered around a CTE (Common Table Expressions). That initially confused me, because the query was already using a CTE. That’s the WITH short_list AS part of the query.

Then I read the AS MATERIALIZED part of his reply and started reading up on that. I changed the query to:

EXPLAIN ANALYSE
WITH short_list AS MATERIALIZED (
    SELECT
        port_id
    FROM
        generate_plist
    WHERE
        textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh')
)
  select CL.commit_date - SystemTimeAdjust() AS last_commit_date, 
         P.id, 
         E.name as port,
         C.name as category, 
...
         P.uses  
       FROM short_list, ports P
       LEFT OUTER JOIN ports_vulnerable    PV  ON PV.port_id       = P.id
       LEFT OUTER JOIN commit_log          CL  ON P.last_commit_id = CL.id,
       element_pathname EP,
       categories C, element E 
 WHERE P.id = short_list.port_id
   AND P.element_id = EP.element_id and EP.pathname like '/ports/head/%'
   AND P.category_id  = C.id
   AND P.element_id   = E.id  AND E.status = 'A'
LIMIT 10;

You can see the change on line 2: AS MATERIALIZED.

The plan (as found at https://explain.depesz.com/s/UBps) takes 1.4s and is back to using an index scan.

Further reading:

Mar 082021
 

One of the many useful features on FreshPorts is: what port[s] install[s] this file? That’s the pkg-plist search option. pkg-plist is a file which “lists all the files installed by the port”. However not all ports have a pkg-plist file because the list is compiled automatically. That is why the configure_plist table was created to replace the ports.pkg_plist column. The creation of that table broke the search function because it was then working on outdated data.

Last week, I blogged about how I created a new a new stored procedure for pulling back the pkg-plist information.

The new table

By new, I mean new to this search solution. The table looks like this:

freshports.dev=# \d generate_plist
                                Table "public.generate_plist"
     Column     |  Type   | Collation | Nullable |                  Default                   
----------------+---------+-----------+----------+--------------------------------------------
 id             | bigint  |           | not null | nextval('generate_plist_id_seq'::regclass)
 port_id        | integer |           | not null | 
 installed_file | text    |           | not null | 
Indexes:
    "generate_plist_installed_file_gin_idx" gin (to_tsvector('english'::regconfig, installed_file))
    "generate_plist_installed_file_idx" btree (installed_file)
    "generate_plist_pk" btree (id)
    "generate_plist_port_id_idx" btree (port_id)

freshports.dev=# 

For a given port, there will be N entries in this table, one for each item in the pkg-plist. You will notice some indexes on the installed_file column.

A simple search

With the old solution, we could search like this, but it took time:

AND P.pkg_plist ILIKE '%share/examples/acme.sh/dnsapi/dns_1984hosting.sh%'

Over 2.7 seconds in fact.

Let’s try looking in the generate_plist table instead. I was reading the Tables and Indexes section of the PostgreSQL Full Text Search and decided to start with this:

ALTER TABLE public.generate_plist
    ADD COLUMN textsearchable_index_col tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, installed_file)) STORED;

CREATE INDEX generate_plist_textsearch_idx
    ON public.generate_plist USING gin
    (textsearchable_index_col)
    TABLESPACE pg_default;

With that, the following search was pretty damn nice.

freshports.dev=# explain analyse    SELECT
        port_id
    FROM
        generate_plist
    WHERE
        textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh')
;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on generate_plist  (cost=367.55..2143.46 rows=426 width=4) (actual time=2.116..2.123 rows=2 loops=1)
   Recheck Cond: (textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh'::text))
   Heap Blocks: exact=2
   ->  Bitmap Index Scan on generate_plist_textsearch_idx  (cost=0.00..367.44 rows=426 width=0) (actual time=2.095..2.095 rows=2 loops=1)
         Index Cond: (textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh'::text))
 Planning Time: 0.434 ms
 Execution Time: 2.208 ms
(7 rows)

But this was … umm, what?

freshports.dev=# SELECT
        port_id
    FROM
        generate_plist
    WHERE
        textsearchable_index_col @@ to_tsquery('dns_1984hosting.sh')
;
 port_id 
---------
(0 rows)

freshports.dev=# 

Nothing found? What?

Full text search is not like that

Full text search is not like that. My search on the full path name would have worked just fine with the existing index:

freshports.dev=# explain analyse SELECT
        port_id
    FROM
        generate_plist
    WHERE
        installed_file = 'share/examples/acme.sh/dnsapi/dns_1984hosting.sh';
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using generate_plist_installed_file_idx on generate_plist  (cost=0.56..28.66 rows=6 width=4) (actual time=0.046..0.052 rows=2 loops=1)
   Index Cond: (installed_file = 'share/examples/acme.sh/dnsapi/dns_1984hosting.sh'::text)
 Planning Time: 0.222 ms
 Execution Time: 0.088 ms
(4 rows)

freshports.dev=# 

In fact, it’s faster.

But that’s not what is needed. Let’s look at the use case.

Searching for file names

We want to search for the full path or the file name. Or just dns_1984hosting.sh or dns_1984hosting.

With help from RhodiumToad on IRC, tsearch is all about matching words (technically lexemes), not arbitrary substrings. We need to think in terms of lexemes. For example:

freshports.dev=# select ts_debug('english', 'foo/bar');
                           ts_debug                           
--------------------------------------------------------------
 (file,"File or path name",foo/bar,{simple},simple,{foo/bar})
(1 row)

freshports.dev=# 

So let’s split the file name into distinct words. Let’s convert the / to space and add a new column for that:

ALTER TABLE public.generate_plist
    ADD COLUMN textsearchable_index_col2 tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, translate(installed_file, '/'::text, ' '::text))) STORED;

CREATE INDEX generate_plist_textsearch_idx2
    ON public.generate_plist USING gin
    (textsearchable_index_col2)
    TABLESPACE pg_default;

Let’s try searching now:

freshports.dev=# EXPLAIN ANALYSE
WITH short_list AS (
    SELECT
        port_id, installed_file
    FROM
        generate_plist
    WHERE
     textsearchable_index_col2 @@ to_tsquery('dns_1984hosting.sh')
)
  select P.id, element_pathname(P.element_id), SL.installed_file
  FROM ports P, short_list SL
 WHERE P.id = SL.port_id;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=171.64..25352.59 rows=2981 width=88) (actual time=1.337..1.506 rows=2 loops=1)
   ->  Bitmap Heap Scan on generate_plist  (cost=171.35..12102.68 rows=2981 width=56) (actual time=1.031..1.040 rows=2 loops=1)
         Recheck Cond: (textsearchable_index_col2 @@ to_tsquery('dns_1984hosting.sh'::text))
         Heap Blocks: exact=2
         ->  Bitmap Index Scan on generate_plist_textsearch_idx2  (cost=0.00..170.61 rows=2981 width=0) (actual time=1.015..1.015 rows=2 loops=1)
               Index Cond: (textsearchable_index_col2 @@ to_tsquery('dns_1984hosting.sh'::text))
   ->  Index Scan using ports_pkey on ports p  (cost=0.29..4.19 rows=1 width=8) (actual time=0.014..0.014 rows=1 loops=2)
         Index Cond: (id = generate_plist.port_id)
 Planning Time: 1.064 ms
 Execution Time: 1.596 ms
(10 rows)

freshports.dev=# 

Umm, that’s fast.

But if we try the file without the .sh we get nothing:

freshports.dev=# WITH short_list AS (
    SELECT
        port_id, installed_file
    FROM
        generate_plist
    WHERE
     textsearchable_index_col2 @@ to_tsquery('dns_1984hosting')
)
  select P.id, element_pathname(P.element_id), SL.installed_file
  FROM ports P, short_list SL
 WHERE P.id = SL.port_id;
 id | element_pathname | installed_file 
----+------------------+----------------
(0 rows)

freshports.dev=# 

Another column for that

Humor me here while I add another column:

ALTER TABLE public.generate_plist
    ADD COLUMN textsearchable_index_col3 tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, translate(installed_file, '/.'::text, '  '::text))) STORED;

CREATE INDEX generate_plist_textsearch_idx3
    ON public.generate_plist USING gin
    (textsearchable_index_col3)
    TABLESPACE pg_default;

This translates both / and . to a space. Now we have:

freshports.dev=# WITH short_list AS (
    SELECT
        port_id, installed_file
    FROM
        generate_plist
    WHERE
     textsearchable_index_col3 @@ to_tsquery('dns_1984hosting')
)
  select P.id, element_pathname(P.element_id), SL.installed_file
  FROM ports P, short_list SL
 WHERE P.id = SL.port_id;
  id   |            element_pathname             |                  installed_file                  
-------+-----------------------------------------+--------------------------------------------------
 59654 | /ports/branches/2020Q3/security/acme.sh | share/examples/acme.sh/dnsapi/dns_1984hosting.sh
 43508 | /ports/head/security/acme.sh            | share/examples/acme.sh/dnsapi/dns_1984hosting.sh
(2 rows)

freshports.dev=# 

Which is still lightning fast.

Doing them all at once

Creating a query which uses all three columns is still performant:

freshports.dev=# explain analyse WITH short_list AS (
    SELECT
        DISTINCT port_id, installed_file
    FROM
        generate_plist
    WHERE
        textsearchable_index_col  @@ to_tsquery('bacula')
     OR textsearchable_index_col2 @@ to_tsquery('bacula')
     OR textsearchable_index_col3 @@ to_tsquery('bacula')
)
  select P.id, element_pathname(P.element_id), SL.installed_file
  FROM ports P, short_list SL
 WHERE P.id = SL.port_id;
                                                                                                       QUERY PLAN                                                                                                       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=50810.64..53335.38 rows=7790 width=88) (actual time=273.782..520.103 rows=1872 loops=1)
   Hash Cond: (sl.port_id = p.id)
   ->  Subquery Scan on sl  (cost=35602.05..35757.85 rows=7790 width=56) (actual time=14.561..16.078 rows=1872 loops=1)
         ->  HashAggregate  (cost=35602.05..35679.95 rows=7790 width=56) (actual time=14.558..15.495 rows=1872 loops=1)
               Group Key: generate_plist.port_id, generate_plist.installed_file
               ->  Bitmap Heap Scan on generate_plist  (cost=1705.16..35563.02 rows=7806 width=56) (actual time=11.540..12.771 rows=1872 loops=1)
                     Recheck Cond: ((textsearchable_index_col @@ to_tsquery('bacula'::text)) OR (textsearchable_index_col2 @@ to_tsquery('bacula'::text)) OR (textsearchable_index_col3 @@ to_tsquery('bacula'::text)))
                     Heap Blocks: exact=116
                     ->  BitmapOr  (cost=1705.16..1705.16 rows=7808 width=0) (actual time=11.489..11.490 rows=0 loops=1)
                           ->  Bitmap Index Scan on generate_plist_textsearch_idx  (cost=0.00..415.44 rows=426 width=0) (actual time=2.994..2.995 rows=36 loops=1)
                                 Index Cond: (textsearchable_index_col @@ to_tsquery('bacula'::text))
                           ->  Bitmap Index Scan on generate_plist_textsearch_idx2  (cost=0.00..170.61 rows=2981 width=0) (actual time=1.373..1.374 rows=1857 loops=1)
                                 Index Cond: (textsearchable_index_col2 @@ to_tsquery('bacula'::text))
                           ->  Bitmap Index Scan on generate_plist_textsearch_idx3  (cost=0.00..1113.26 rows=4401 width=0) (actual time=7.117..7.117 rows=1872 loops=1)
                                 Index Cond: (textsearchable_index_col3 @@ to_tsquery('bacula'::text))
   ->  Hash  (cost=14174.04..14174.04 rows=63004 width=8) (actual time=258.282..258.283 rows=63065 loops=1)
         Buckets: 32768  Batches: 4  Memory Usage: 877kB
         ->  Seq Scan on ports p  (cost=0.00..14174.04 rows=63004 width=8) (actual time=0.089..219.485 rows=63065 loops=1)
 Planning Time: 1.227 ms
 Execution Time: 521.244 ms
(20 rows)

freshports.dev=# 

At what cost?

What does the table look like now:

freshports.dev=# \d generate_plist
                                                                          Table "public.generate_plist"
          Column           |   Type   | Collation | Nullable |                                                      Default                                                      
---------------------------+----------+-----------+----------+-------------------------------------------------------------------------------------------------------------------
 id                        | bigint   |           | not null | nextval('generate_plist_id_seq'::regclass)
 port_id                   | integer  |           | not null | 
 installed_file            | text     |           | not null | 
 textsearchable_index_col  | tsvector |           |          | generated always as (to_tsvector('english'::regconfig, installed_file)) stored
 textsearchable_index_col2 | tsvector |           |          | generated always as (to_tsvector('english'::regconfig, translate(installed_file, '/'::text, ' '::text))) stored
 textsearchable_index_col3 | tsvector |           |          | generated always as (to_tsvector('english'::regconfig, translate(installed_file, '/.'::text, '  '::text))) stored
Indexes:
    "generate_plist_installed_file_gin_idx" gin (to_tsvector('english'::regconfig, installed_file))
    "generate_plist_installed_file_idx" btree (installed_file)
    "generate_plist_pk" btree (id)
    "generate_plist_port_id_idx" btree (port_id)
    "generate_plist_textsearch_idx" gin (textsearchable_index_col)
    "generate_plist_textsearch_idx2" gin (textsearchable_index_col2)
    "generate_plist_textsearch_idx3" gin (textsearchable_index_col3)

How much space does this take:



freshports.dev=# select count(*) from generate_plist;
  count  
---------
 8478513
(1 row)


freshports.dev=# SELECT pg_table_size('generate_plist');
 pg_table_size 
---------------
    3578167296
(1 row)

freshports.dev=# select pg_total_relation_size('generate_plist');
 pg_total_relation_size 
------------------------
             6730162176
(1 row)

That’s 8.5 million rows taking up 3.3 GB but with indexes it’s 6.3 GB.

Disk is cheap. Time is expensive. I’d save the cost is worth the performance.

Mar 022021
 

In this post I will show you how I discovered that the backend was putting data into a new table (generate_plist) but the website was still using the ports.pkg_plist column. The result was stale data being shown on the website (for older ports) or no data at all (for newer ports).

How long? Since Oct 2017, because that’s when configure_plist showed up on the website. Nobody noticed until recently. I can only guess that I made the backend changes with intent to change the frontend later. I should have created myself a ticket.

Why was the change made? Primarily because all ports, especially Python based ports, have a pkg-plist file. These ports generate the list automatically at package-built time. Secondly, it also simplifies the search of pkg-plist.

Ironically, this bug also affects search, which is still using the old column.

Background

One set of data which FreshPorts shows is the list of files installed by a given package. This information can be provided via a file, called pkg-plist.

Here is one example for the comms/gnuradio port:

freshports.dev=# select pkg_plist from ports where id = 34724;
             pkg_plist              
------------------------------------
 include/libconcord.h              +
 lib/libconcord.a                  +
 lib/libconcord.so                 +
 lib/libconcord.so.4               +
 lib/libconcord.so.4.0.0           +
 share/mime/packages/libconcord.xml
(1 row)

freshports.dev=# 

NOTE: the above is one row of output. It is a collection of strings, separated by newlines.

If visit the website, you’ll see that list is wrong, because it’s outdated.

The generate-plist command is part of the ports infrastructure. FreshPorts gathers this data with this command and the resulting file.

$ make generate-plist
$ cat work/.PLIST.mktmp

A recent blog post goes into more detail. All that goes into the configure_plist table:

freshports.dev=# \d configure_plist
Did not find any relation named "configure_plist".
freshports.dev=# \d generate_plist
                                Table "public.generate_plist"
     Column     |  Type   | Collation | Nullable |                  Default                   
----------------+---------+-----------+----------+--------------------------------------------
 id             | bigint  |           | not null | nextval('generate_plist_id_seq'::regclass)
 port_id        | integer |           | not null | 
 installed_file | text    |           | not null | 
Indexes:
    "generate_plist_installed_file_gin_idx" gin (to_tsvector('english'::regconfig, installed_file))
    "generate_plist_installed_file_idx" btree (installed_file)
    "generate_plist_pk" btree (id)
    "generate_plist_port_id_idx" btree (port_id)

freshports.dev=# 

The original data

This is what the data looked like, as demonstrated by this query:

freshports.dev=# select pkg_plist, array_to_json(regexp_match(pkg_plist, 'lib/[[:alpha:]]*?\.so')) from ports where id = 34724;
             pkg_plist              |     array_to_json     
------------------------------------+-----------------------
 include/libconcord.h              +| ["lib/libconcord.so"]
 lib/libconcord.a                  +| 
 lib/libconcord.so                 +| 
 lib/libconcord.so.4               +| 
 lib/libconcord.so.4.0.0           +| 
 share/mime/packages/libconcord.xml | 
(1 row)

NOTE: This example user :alpha: but I later changed this to
:alnum: in my code.

NOTE: The above is one row.

The goal is to grab all the .so libraries.

The new layout

The new table, with refreshed data, looks like:

freshports.dev=# select * from generate_plist where port_id = 34724;
    id    | port_id |                       installed_file                       
----------+---------+------------------------------------------------------------
 27791797 |   34724 | @shared-mime-info share/mime
 27791798 |   34724 | /usr/local/share/licenses/libconcord-1.2_2/catalog.mk
 27791799 |   34724 | /usr/local/share/licenses/libconcord-1.2_2/LICENSE
 27791800 |   34724 | /usr/local/share/licenses/libconcord-1.2_2/GPLv2
 27791801 |   34724 | include/libconcord.h
 27791802 |   34724 | lib/libconcord.a
 27791803 |   34724 | lib/libconcord.so
 27791804 |   34724 | lib/libconcord.so.4
 27791805 |   34724 | lib/libconcord.so.4.0.0
 27791806 |   34724 | share/mime/packages/libconcord.xml
 27791807 |   34724 | @postexec /usr/sbin/service ldconfig restart > /dev/null
 27791808 |   34724 | @postunexec /usr/sbin/service ldconfig restart > /dev/null
(12 rows)

freshports.dev=# 

First go

NOTE: I have removed the array_to_json() call for now:

freshports.dev=# select installed_file, regexp_match(installed_file, 'lib/[[:alpha:]]*?\.so') 
from generate_plist where port_id = 34724;
                       installed_file                       |    regexp_match     
------------------------------------------------------------+---------------------
 @shared-mime-info share/mime                               | 
 /usr/local/share/licenses/libconcord-1.2_2/catalog.mk      | 
 /usr/local/share/licenses/libconcord-1.2_2/LICENSE         | 
 /usr/local/share/licenses/libconcord-1.2_2/GPLv2           | 
 include/libconcord.h                                       | 
 lib/libconcord.a                                           | 
 lib/libconcord.so                                          | {lib/libconcord.so}
 lib/libconcord.so.4                                        | {lib/libconcord.so}
 lib/libconcord.so.4.0.0                                    | {lib/libconcord.so}
 share/mime/packages/libconcord.xml                         | 
 @postexec /usr/sbin/service ldconfig restart > /dev/null   | 
 @postunexec /usr/sbin/service ldconfig restart > /dev/null | 
(12 rows)

That looks good so far. What I did not realize until later: each value of the regexp_match output is an array.

Let’s try this on a port with more libraries:

freshports.dev=# select distinct regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
   from generate_plist
  WHERE generate_plist.port_id = 9589 and regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so')  is not null;
        lib         
--------------------
 {lib/libhamlib.so}
 {lib/hamlibtcl.so}
 {lib/Hamlib.so}
(3 rows)

That still looks good.

Getting rid of duplicate values and ignoring null

I thought this was a simple way to get rid of the uninteresting values. I like using WITH. It allows you to take a very complex set of data and adjust it before performing additional operations upon it. What I’m doing here is not very complicated and I am sure it can be done without WITH.

freshports.dev=# WITH tmp AS (
    select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
      from generate_plist
     where generate_plist.port_id = 9589
)
select distinct lib from tmp where lib is not null;
        lib         
--------------------
 {lib/libhamlib.so}
 {lib/hamlibtcl.so}
 {lib/Hamlib.so}
(3 rows)

JSON time

In this section, there are three main changes.

Convert the results from an array to the first element of that array. We a dealing with a single file. We should get just one match.

freshports.dev=# WITH tmp AS (
    select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
      from generate_plist
     where generate_plist.port_id = 9589)
select distinct lib[1] from tmp where lib is not null;
       lib        
------------------
 lib/hamlibtcl.so
 lib/libhamlib.so
 lib/Hamlib.so
(3 rows)

Next, combine all the rows into one row, using array.

freshports.dev=# WITH tmp AS (
    select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
      from generate_plist
     where generate_plist.port_id = 9589)
select array(select distinct lib[1] from tmp where lib is not null);
                       array                       
---------------------------------------------------
 {lib/hamlibtcl.so,lib/libhamlib.so,lib/Hamlib.so}
(1 row)

The last step, change that to JSON output.

freshports.dev=# WITH tmp AS (
    select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
      from generate_plist
     where generate_plist.port_id = 9589)
select array_to_json(array(select distinct lib[1] from tmp where lib is not null));
                      array_to_json                      
---------------------------------------------------------
 ["lib/hamlibtcl.so","lib/libhamlib.so","lib/Hamlib.so"]
(1 row)

But…

I seems wasteful, or backward, to be converting to an array and then to JSON, but I think that’s the way it needs to be done for this data. I’m happy to have additional ideas if you have them.

EDIT: Magnus Hagander gave me the solution there. Use json_agg instead.

freshports.dev=# WITH tmp AS (
    select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
      from generate_plist
     where generate_plist.port_id = 14152)
select json_agg(distinct lib[1]) from tmp where lib is not null;
                                              json_agg                                               
-----------------------------------------------------------------------------------------------------
 ["lib/libfreeipmi.so", "lib/libipmiconsole.so", "lib/libipmidetect.so", "lib/libipmimonitoring.so"]
(1 row)

A more complex example

Let’s try sysutils/freeipmi which has more librarries:

freshports.dev=# WITH tmp AS (
    select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
      from generate_plist
     where generate_plist.port_id = 14152)
select array_to_json(array(select distinct lib[1] from tmp where lib is not null));
                                          array_to_json                                           
--------------------------------------------------------------------------------------------------
 ["lib/libfreeipmi.so","lib/libipmidetect.so","lib/libipmiconsole.so","lib/libipmimonitoring.so"]
(1 row)

freshports.dev=# 

That seems to work.

Put it into a function

This is the function I created:

-- to convert the generate_plist table, multiple rows per port, into a
-- single row of JSON. Mostly used for Dependency line:
--
-- re https://github.com/FreshPorts/freshports/issues/216
--
CREATE OR REPLACE FUNCTION pkg_plist(bigint) returns json as $$
   WITH tmp AS (
       select regexp_match(installed_file, 'lib/[[:alnum:]]*?\.so') as lib 
         from generate_plist
        where generate_plist.port_id = $1
   )
   select array_to_json(array(select distinct lib[1] from tmp where lib is not null));
$$ LANGUAGE SQL STABLE;

The changed code

The original code was:

freshports.dev=# explain analyse
select pkg_plist, array_to_json(regexp_match(pkg_plist, 'lib/[[:alpha:]]*?\.so')) from ports where id = 34724;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using ports_pkey on ports  (cost=0.29..8.31 rows=1 width=190) (actual time=0.092..0.096 rows=1 loops=1)
   Index Cond: (id = 34724)
 Planning Time: 0.305 ms
 Execution Time: 0.138 ms
(4 rows)
freshports.dev=# 

The new code is:

freshports.dev=# explain analyse
freshports.dev-# select pkg_plist, pkg_plist(id) from ports where id = 34724;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using ports_pkey on ports  (cost=0.29..8.56 rows=1 width=190) (actual time=0.323..0.324 rows=1 loops=1)
   Index Cond: (id = 34724)
 Planning Time: 0.229 ms
 Execution Time: 0.340 ms
(4 rows)

That is an acceptable difference.

Thanks.

Feb 242021
 

This is a cheat sheet for FreshPorts behind-the-scenes administration. If you want one for users, please start with suggested content.

Ingress Actions

  1. git-single-commit.sh – Take a commit from the repo and insert it into the database. Will fail if commit is already there. REPO is one of src, ports, doc. Hash is the git commit has.
    /usr/local/libexec/freshports/git-single-commit.sh REPO HASH
    
  2. process vuxml – This will invoke vuxml processing of whatever is checked out into the cd ~freshports/ports-jail/var/db/repos/PORTS-head/ working copy of the ports. Tree. It might also work on the ~freshports/ports-jail/var/db/repos/ports (git) repo, but I don’t know how it tells which one to work on.
    echo touch /var/db/freshports/signals/vuxml /var/db/freshports/signals/job_waiting | sudo su -fm freshports
    

    Ahh, it knows based on VULNFILE=”${FRESHPORTS_JAIL_BASE_DIR}${PORTSDIR}/security/vuxml/vuln.xml”.

  3. vuln_latest.pl – refreshes the Latest Vulnerabilities box of the website.
    echo /usr/local/bin/perl /usr/local/libexec/freshports/vuln_latest.pl | sudo su -fm freshports
    
  4. Archive messages from 11 days ago
    echo /usr/local/libexec/freshports/archive-messages.sh 11 | sudo su -fm freshports
    

Ingress Services

  1. ingress – processes git commits into XML
  2. ingress_svn – processes subversion commits into XML
  3. freshports – processes the XML and loads them into the database

Caching

  • NOTIFY port_updated; – issue this command within the database. The fp_listen daemon will wake up, read the table cache_clearing_ports, and clear the cache for the listed port.
Feb 102021
 

By request FreeBSD 14 is being added to the list of build packages. Looking at the main freebsd pkg mirror, we can see FreeBSD:14 is listed.

These are the items in question:

  • FreeBSD:14:i386
  • FreeBSD:14:amd64
  • FreeBSD:14:aarch64
  • FreeBSD:14:armv6
  • FreeBSD:14:armv7
  • FreeBSD:14:mips
  • FreeBSD:14:mips64
  • FreeBSD:14:powerpc64

The SQL is:

insert in abi (name) values (.... 

But why type all this, code it.

#!/bin/sh

echo 'insert into abi (name) values '
ARCHS="aarch64 amd64 armv6 armv7 i386 mips mips64 powerpc64"
for ARCH in $ARCHS
do
  echo -n "('FreeBSD:14:$ARCH'), "
done
echo ';'

Which produces:

[dan@pg02:~/bin] $ sh ./generate_abi_insert_sql
insert into abi (name) values 
('FreeBSD:14:aarch64'), ('FreeBSD:14:amd64'), ('FreeBSD:14:armv6'), ('FreeBSD:14:armv7'), ('FreeBSD:14:i386'), ('FreeBSD:14:mips'), ('FreeBSD:14:mips64'), ('FreeBSD:14:powerpc64'), ;
[dan@pg02:~/bin] $ 

Note the extra trailing comma in the SQL. It is easier to remove that manually than to update the code.

Running that SQL:

freshports.dev=# begin;
BEGIN
freshports.dev=# insert into abi (name) values 
('FreeBSD:14:aarch64'), ('FreeBSD:14:amd64'), ('FreeBSD:14:armv6'), ('FreeBSD:14:armv7'), ('FreeBSD:14:i386'), ('FreeBSD:14:mips'), ('FreeBSD:14:mips64'), ('FreeBSD:14:powerpc64') ;
INSERT 0 8
freshports.dev=# select * from abi;
 id |         name         | active 
----+----------------------+--------
  1 | FreeBSD:12:amd64     | t
  2 | FreeBSD:13:aarch64   | t
  4 | FreeBSD:11:i386      | t
  6 | FreeBSD:11:amd64     | t
  7 | FreeBSD:11:aarch64   | t
  8 | FreeBSD:12:i386      | t
  9 | FreeBSD:12:aarch64   | t
 10 | FreeBSD:13:i386      | t
 11 | FreeBSD:13:amd64     | t
 12 | FreeBSD:11:armv6     | t
 13 | FreeBSD:11:mips      | t
 14 | FreeBSD:11:mips64    | t
 15 | FreeBSD:12:armv6     | t
 16 | FreeBSD:12:armv7     | t
 17 | FreeBSD:12:mips      | t
 18 | FreeBSD:12:mips64    | t
 19 | FreeBSD:12:powerpc64 | t
 20 | FreeBSD:13:armv6     | t
 21 | FreeBSD:13:armv7     | t
 22 | FreeBSD:13:mips      | t
 23 | FreeBSD:13:mips64    | t
 24 | FreeBSD:13:powerpc64 | t
 25 | FreeBSD:14:aarch64   | t
 26 | FreeBSD:14:amd64     | t
 27 | FreeBSD:14:armv6     | t
 28 | FreeBSD:14:armv7     | t
 29 | FreeBSD:14:i386      | t
 30 | FreeBSD:14:mips      | t
 31 | FreeBSD:14:mips64    | t
 32 | FreeBSD:14:powerpc64 | t
(30 rows)

We can see the new rows, and thanks to a trigger, we also have new rows in the packages_last_checked table:

freshports.dev=# select * from packages_last_checked;
 abi_id |         last_checked         |       repo_date        |          import_date          | package_set |        processed_date         
--------+------------------------------+------------------------+-------------------------------+-------------+-------------------------------
      8 | 2021-02-10 17:00:13.20563+00 | 2021-02-10 01:12:07+00 | 2021-02-10 11:21:34.044251+00 | quarterly   | 2021-02-10 11:22:23.725106+00
     17 | 2021-02-10 17:00:13.20563+00 | 2018-10-16 11:04:52+00 | 2020-07-17 14:00:45.505818+00 | latest      | 2020-07-17 14:04:11.704782+00
     17 | 2021-02-10 17:00:13.20563+00 | 2020-10-01 04:01:46+00 | 2020-10-01 05:00:20.235932+00 | quarterly   | 2020-10-01 05:00:24.858447+00
     18 | 2021-02-10 17:00:13.20563+00 | 2018-10-16 09:49:08+00 | 2020-07-17 14:00:52.969214+00 | latest      | 2020-07-17 14:04:20.85262+00
     18 | 2021-02-10 17:00:13.20563+00 | 2020-08-04 15:33:36+00 | 2020-08-04 16:00:29.240376+00 | quarterly   | 2020-08-04 16:00:33.420489+00
     19 | 2021-02-10 17:00:13.20563+00 |                        |                               | latest      | 
     19 | 2021-02-10 17:00:13.20563+00 | 2021-01-26 04:27:25+00 | 2021-01-26 12:00:29.770876+00 | quarterly   | 2021-01-26 12:00:45.577918+00
      2 | 2021-02-10 17:00:13.20563+00 | 2021-01-27 02:49:44+00 | 2021-01-28 08:00:18.188754+00 | latest      | 2021-01-28 08:00:24.905832+00
      2 | 2021-02-10 17:00:13.20563+00 | 2021-02-09 11:53:51+00 | 2021-02-10 11:21:40.79661+00  | quarterly   | 2021-02-10 11:22:35.063951+00
     11 | 2021-02-10 17:00:13.20563+00 | 2021-02-10 02:50:43+00 | 2021-02-10 11:21:46.811564+00 | latest      | 2021-02-10 11:22:46.756966+00
     11 | 2021-02-10 17:00:13.20563+00 | 2021-02-09 14:50:17+00 | 2021-02-10 11:21:53.824806+00 | quarterly   | 2021-02-10 11:23:01.600253+00
     20 | 2021-02-10 17:00:13.20563+00 | 2021-01-25 12:43:34+00 | 2021-01-25 15:00:21.52586+00  | latest      | 2021-01-25 15:00:40.013898+00
     20 | 2021-02-10 17:00:13.20563+00 | 2021-01-30 10:31:09+00 | 2021-01-30 22:00:20.164502+00 | quarterly   | 2021-01-30 22:00:24.297362+00
     21 | 2021-02-10 17:00:13.20563+00 | 2021-01-25 13:03:16+00 | 2021-01-25 15:00:35.365614+00 | latest      | 2021-01-25 15:00:52.742863+00
     21 | 2021-02-10 17:00:13.20563+00 | 2021-01-31 00:34:34+00 | 2021-01-31 12:00:21.788735+00 | quarterly   | 2021-01-31 12:00:30.494576+00
     10 | 2021-02-10 17:00:13.20563+00 | 2021-02-10 02:22:42+00 | 2021-02-10 11:22:01.40289+00  | latest      | 2021-02-10 11:23:09.115036+00
     10 | 2021-02-10 17:00:13.20563+00 | 2021-02-09 01:06:47+00 | 2021-02-09 21:00:29.817299+00 | quarterly   | 2021-02-09 21:00:53.748818+00
     22 | 2021-02-10 17:00:13.20563+00 | 2020-01-25 12:29:08+00 | 2020-07-17 14:01:34.059862+00 | latest      | 2020-07-17 14:05:11.163848+00
     22 | 2021-02-10 17:00:13.20563+00 |                        |                               | quarterly   | 
     23 | 2021-02-10 17:00:13.20563+00 | 2021-01-24 02:59:02+00 | 2021-01-24 04:00:19.382673+00 | latest      | 2021-01-24 04:00:26.718195+00
     23 | 2021-02-10 17:00:13.20563+00 | 2021-02-09 10:21:23+00 | 2021-02-09 11:00:17.626645+00 | quarterly   | 2021-02-09 11:00:20.605528+00
     24 | 2021-02-10 17:00:13.20563+00 | 2021-01-22 06:18:08+00 | 2021-01-22 21:00:29.352318+00 | latest      | 2021-01-22 21:00:42.08494+00
     24 | 2021-02-10 17:00:13.20563+00 |                        |                               | quarterly   | 
     25 |                              |                        |                               | latest      | 
     25 |                              |                        |                               | quarterly   | 
     26 |                              |                        |                               | latest      | 
     26 |                              |                        |                               | quarterly   | 
     27 |                              |                        |                               | latest      | 
     27 |                              |                        |                               | quarterly   | 
     28 |                              |                        |                               | latest      | 
     28 |                              |                        |                               | quarterly   | 
     29 |                              |                        |                               | latest      | 
     29 |                              |                        |                               | quarterly   | 
     30 |                              |                        |                               | latest      | 
     30 |                              |                        |                               | quarterly   | 
     31 |                              |                        |                               | latest      | 
     31 |                              |                        |                               | quarterly   | 
     32 |                              |                        |                               | latest      | 
     32 |                              |                        |                               | quarterly   | 
      7 | 2021-02-10 17:00:13.20563+00 | 2018-09-29 04:24:41+00 | 2020-10-04 15:00:12.762803+00 | latest      | 2020-10-04 15:02:18.094703+00
      7 | 2021-02-10 17:00:13.20563+00 | 2020-08-22 06:40:07+00 | 2020-10-04 15:00:27.026864+00 | quarterly   | 2020-10-04 15:02:39.132243+00
      6 | 2021-02-10 17:00:13.20563+00 | 2021-02-08 21:21:07+00 | 2021-02-09 05:00:13.733719+00 | latest      | 2021-02-09 05:00:27.510551+00
      6 | 2021-02-10 17:00:13.20563+00 | 2021-02-09 05:32:54+00 | 2021-02-09 07:00:12.877082+00 | quarterly   | 2021-02-09 07:00:26.142752+00
     12 | 2021-02-10 17:00:13.20563+00 | 2016-06-30 00:51:31+00 | 2020-10-04 15:00:47.024474+00 | latest      | 2020-10-04 15:03:19.615638+00
     12 | 2021-02-10 17:00:13.20563+00 | 2020-10-02 04:35:39+00 | 2020-10-04 15:00:52.178755+00 | quarterly   | 2020-10-04 15:03:29.339387+00
      4 | 2021-02-10 17:00:13.20563+00 | 2021-02-08 20:27:27+00 | 2021-02-09 03:00:16.235524+00 | latest      | 2021-02-09 03:00:22.4292+00
      4 | 2021-02-10 17:00:13.20563+00 | 2021-02-09 05:09:37+00 | 2021-02-09 07:00:19.26035+00  | quarterly   | 2021-02-09 07:00:34.037177+00
     13 | 2021-02-10 17:00:13.20563+00 | 2016-07-18 00:33:40+00 | 2020-10-04 15:01:15.285386+00 | latest      | 2020-10-04 15:04:03.027721+00
     13 | 2021-02-10 17:00:13.20563+00 | 2020-10-01 02:36:15+00 | 2020-10-04 15:01:19.911912+00 | quarterly   | 2020-10-04 15:04:11.059194+00
     14 | 2021-02-10 17:00:13.20563+00 | 2016-07-25 04:20:42+00 | 2020-10-04 15:01:24.426796+00 | latest      | 2020-10-04 15:04:17.827801+00
     14 | 2021-02-10 17:00:13.20563+00 | 2020-10-01 08:20:18+00 | 2021-02-04 11:00:15.371228+00 | quarterly   | 2021-02-04 11:00:22.551944+00
      9 | 2021-02-10 17:00:13.20563+00 | 2018-10-11 05:47:38+00 | 2020-10-04 15:01:35.379828+00 | latest      | 2020-10-04 15:04:32.751276+00
      9 | 2021-02-10 17:00:13.20563+00 | 2021-02-07 11:06:07+00 | 2021-02-07 13:00:14.959275+00 | quarterly   | 2021-02-07 13:00:23.122086+00
      1 | 2021-02-10 17:00:13.20563+00 | 2021-02-08 01:42:33+00 | 2021-02-09 21:00:17.873407+00 | latest      | 2021-02-09 21:00:35.348509+00
      1 | 2021-02-10 17:00:13.20563+00 | 2021-02-10 01:03:25+00 | 2021-02-10 11:21:25.104526+00 | quarterly   | 2021-02-10 11:22:10.609837+00
     15 | 2021-02-10 17:00:13.20563+00 | 2018-10-10 14:02:29+00 | 2020-10-04 15:02:04.197145+00 | latest      | 2020-10-04 15:05:20.600131+00
     15 | 2021-02-10 17:00:13.20563+00 | 2020-12-01 08:59:46+00 | 2020-12-01 10:00:28.46244+00  | quarterly   | 2020-12-01 10:01:00.107119+00
     16 | 2021-02-10 17:00:13.20563+00 | 2018-10-15 02:25:58+00 | 2020-07-17 14:00:24.261982+00 | latest      | 2020-07-17 14:03:44.976904+00
     16 | 2021-02-10 17:00:13.20563+00 | 2020-12-01 08:43:59+00 | 2020-12-01 10:00:34.626725+00 | quarterly   | 2020-12-01 10:01:13.981269+00
      8 | 2021-02-10 17:00:13.20563+00 | 2021-02-07 08:09:21+00 | 2021-02-08 22:00:15.707564+00 | latest      | 2021-02-08 22:00:29.591642+00
(60 rows)

This is a good example of how a result set is not ordered. Those rows without any timestamps, those are our new rows. The abi_id column contains duplicate values because there is one for quarterly (e.g. 2021Q1) and one for latest (i.e. head).

Now I wait and watch the logs at the top of the hour.

later

I checked the logs, reproduced here with formatting to ease reading:

Feb 10 18:00:16 dev-ingress01 check_repos_for_new_stuff.py[69136]: There are 5 new repos ready for import: 
[
'FreeBSD:14:amd64/latest : Sun, 07 Feb 2021 06:15:49 GMT', 
'FreeBSD:14:armv6/latest : Thu, 04 Feb 2021 01:25:36 GMT', 
'FreeBSD:14:armv7/latest : Fri, 05 Feb 2021 11:48:09 GMT', 
'FreeBSD:14:i386/latest : Sun, 07 Feb 2021 06:44:35 GMT', 
'FreeBSD:14:mips64/latest : Sun, 07 Feb 2021 10:37:05 GMT'
]

Looking at the tables, with slightly different SQL, I found:

freshports.dev=# select abi.name, packages_last_checked.* from abi, packages_last_checked where abi.id = packages_last_checked.abi_id and abi.name like 'FreeBSD:14:%' order by name;
         name         | abi_id |         last_checked         |       repo_date        |          import_date          | package_set |        processed_date         
----------------------+--------+------------------------------+------------------------+-------------------------------+-------------+-------------------------------
 FreeBSD:14:aarch64   |     25 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:aarch64   |     25 | 2021-02-10 18:00:10.09146+00 |                        |                               | latest      | 
 FreeBSD:14:amd64     |     26 | 2021-02-10 18:00:10.09146+00 | 2021-02-07 06:15:49+00 | 2021-02-10 18:00:17.632469+00 | latest      | 2021-02-10 18:00:39.25916+00
 FreeBSD:14:amd64     |     26 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:armv6     |     27 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:armv6     |     27 | 2021-02-10 18:00:10.09146+00 | 2021-02-04 01:25:36+00 | 2021-02-10 18:00:23.779767+00 | latest      | 2021-02-10 18:00:53.136252+00
 FreeBSD:14:armv7     |     28 | 2021-02-10 18:00:10.09146+00 | 2021-02-05 11:48:09+00 | 2021-02-10 18:00:27.640483+00 | latest      | 2021-02-10 18:00:59.62607+00
 FreeBSD:14:armv7     |     28 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:i386      |     29 | 2021-02-10 18:00:10.09146+00 | 2021-02-07 06:44:35+00 | 2021-02-10 18:00:31.501863+00 | latest      | 2021-02-10 18:01:06.671366+00
 FreeBSD:14:i386      |     29 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:mips      |     30 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:mips      |     30 | 2021-02-10 18:00:10.09146+00 |                        |                               | latest      | 
 FreeBSD:14:mips64    |     31 | 2021-02-10 18:00:10.09146+00 | 2021-02-07 10:37:05+00 | 2021-02-10 18:00:36.896226+00 | latest      | 2021-02-10 18:01:15.138216+00
 FreeBSD:14:mips64    |     31 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
 FreeBSD:14:powerpc64 |     32 | 2021-02-10 18:00:10.09146+00 |                        |                               | latest      | 
 FreeBSD:14:powerpc64 |     32 | 2021-02-10 18:00:10.09146+00 |                        |                               | quarterly   | 
(16 rows)

freshports.dev=# 

There you go. That’s how difficult it is to add new ABI.

Feb 062021
 

EDIT: update, I think I have solved this via help on IRC. See solution.

I need help configuring Postfix. I am overwhelmed by the options available and I need to stay on track by working on the rest of the git changes.

I have changed references to the FreshPorts domain. I’m using example.org instead.

Background

FreshPorts processes incoming subversion commit emails to create XML which is then loaded into the database.

I need help configuring a new host: mx-ingress04

Mail goes from the FreeBSD mailing list to mx-ingress02 which then distributes it to other nodes.

mx-ingress04 is on AWS and this is what I’m having trouble with. This host is restricted by AWS on outgoing port 25 connections. To solve that, I can use relayhost to another mail server which is configured to relay.

The problem

When I enable relay on mx-ingress04, it breaks the delivery which works now. All incoming mail goes out via relay instead of going to the ingress host for processing.

Logs with relayhost=10.0.17.21:

Feb  6 20:39:43 mx-ingress04 postfix/smtpd[77346]: connect from unknown[198.51.100.203]
Feb  6 20:39:43 mx-ingress04 postfix/smtpd[77346]: CA1BABCFB: client=unknown[198.51.100.203]
Feb  6 20:39:43 mx-ingress04 postfix/cleanup[77352]: CA1BABCFB: message-id=<20210206203943.99D71EC81@mx-ingress02.example.org>
Feb  6 20:39:43 mx-ingress04 postfix/qmgr[77341]: CA1BABCFB: from=<dan@mx-ingress02.example.org>, size=572, nrcpt=1 (queue active)
Feb  6 20:39:43 mx-ingress04 postfix/smtpd[77346]: disconnect from unknown[198.51.100.203] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Feb  6 20:39:43 mx-ingress04 postfix/cleanup[77352]: CED1DBD8F: message-id=<20210206203943.99D71EC81@mx-ingress02.example.org>
Feb  6 20:39:43 mx-ingress04 postfix/local[77353]: CA1BABCFB: to=<production@mx-ingress04.example.org>, relay=local, delay=0.03, delays=0.02/0.01/0/0, dsn=2.0.0, status=sent (forwarded as CED1DBD8F)
Feb  6 20:39:43 mx-ingress04 postfix/qmgr[77341]: CED1DBD8F: from=<dan@mx-ingress02.example.org>, size=732, nrcpt=1 (queue active)
Feb  6 20:39:43 mx-ingress04 postfix/qmgr[77341]: CA1BABCFB: removed
Feb  6 20:39:43 mx-ingress04 postfix/smtp[77354]: CED1DBD8F: to=<ingress_svn@[127.163.0.10]>, orig_to=<production@mx-ingress04.example.org>, relay=10.0.17.21[10.0.17.21]:25, delay=0.03, delays=0/0.01/0.01/0, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as D4EDA2E2EB)
Feb  6 20:39:43 mx-ingress04 postfix/qmgr[77341]: CED1DBD8F: removed

Logs without relayhost:

Feb  6 20:39:26 mx-ingress04 postfix/smtpd[77211]: connect from unknown[198.51.100.203]
Feb  6 20:39:26 mx-ingress04 postfix/smtpd[77211]: 5A7A9BD76: client=unknown[198.51.100.203]
Feb  6 20:39:26 mx-ingress04 postfix/cleanup[77217]: 5A7A9BD76: message-id=<20210206203926.1938AEB7D@mx-ingress02.example.org>
Feb  6 20:39:26 mx-ingress04 postfix/smtpd[77211]: disconnect from unknown[198.51.100.203] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Feb  6 20:39:26 mx-ingress04 postfix/qmgr[77205]: 5A7A9BD76: from=<dan@mx-ingress02.example.org>, size=572, nrcpt=1 (queue active)
Feb  6 20:39:26 mx-ingress04 postfix/cleanup[77217]: 5D2F9BCF9: message-id=<20210206203926.1938AEB7D@mx-ingress02.example.org>
Feb  6 20:39:26 mx-ingress04 postfix/local[77218]: 5A7A9BD76: to=<production@mx-ingress04.example.org>, relay=local, delay=0.01, delays=0.01/0/0/0, dsn=2.0.0, status=sent (forwarded as 5D2F9BCF9)
Feb  6 20:39:26 mx-ingress04 postfix/qmgr[77205]: 5D2F9BCF9: from=<dan@mx-ingress02.example.org>, size=732, nrcpt=1 (queue active)
Feb  6 20:39:26 mx-ingress04 postfix/qmgr[77205]: 5A7A9BD76: removed
Feb  6 20:39:26 mx-ingress04 postfix/smtp[77219]: 5D2F9BCF9: to=<ingress_svn@[127.163.0.10]>, orig_to=<production@mx-ingress04.example.org>, relay=127.163.0.10[127.163.0.10]:25, delay=0.05, delays=0/0/0.04/0, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 683A35F665)
Feb  6 20:39:26 mx-ingress04 postfix/qmgr[77205]: 5D2F9BCF9: removed

The goal

At mx-ingress04, Incoming mail for production@mx-ingress04.example.org goes to ingress_svn@[127.163.0.10].

Any mail originating on mx-ingress04 is relayed out to 10.0.17.21.

Existing configuration

This is the existing configuration on mx-ingress04:

# postconf -n
alias_maps = hash:/etc/mail/aliases
command_directory = /usr/local/sbin
compatibility_level = 2
daemon_directory = /usr/local/libexec/postfix
data_directory = /var/db/postfix
debug_peer_level = 2
debugger_command = PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin ddd $daemon_directory/$process_name $process_id & sleep 5
html_directory = no
inet_interfaces = 127.163.0.25, [redacted]:1a17:f712:6854:4175:9eaa
inet_protocols = ipv4, ipv6
mail_owner = postfix
mailbox_command = /usr/local/bin/maildrop -d ${USER}
mailbox_size_limit = 102400000
mailq_path = /usr/local/bin/mailq
manpage_directory = /usr/local/man
message_size_limit = 102400000
mynetworks_style = host
newaliases_path = /usr/local/bin/newaliases
queue_directory = /var/spool/postfix
readme_directory = no
recipient_delimiter = +
sample_directory = /usr/local/etc/postfix
sendmail_path = /usr/local/sbin/sendmail
setgid_group = maildrop
smtp_tls_CAfile = /usr/local/etc/ssl/ca.cer
smtp_tls_cert_file = /usr/local/etc/ssl/mx-ingress04.example.org.fullchain.cer
smtp_tls_fingerprint_digest = sha1
smtp_tls_key_file = /usr/local/etc/ssl/mx-ingress04.example.org.key
smtp_tls_policy_maps = hash:/usr/local/etc/postfix/tls_policy
smtp_tls_security_level = may
smtp_tls_session_cache_database = btree:/var/db/postfix/smtp_scache
smtpd_tls_CAfile = /usr/local/etc/ssl/ca.cer
smtpd_tls_cert_file = /usr/local/etc/ssl/mx-ingress04.example.org.fullchain.cer
smtpd_tls_fingerprint_digest = sha1
smtpd_tls_key_file = /usr/local/etc/ssl/mx-ingress04.example.org.key
smtpd_tls_security_level = encrypt
smtpd_tls_session_cache_database = btree:/var/db/postfix/smtpd_scache
soft_bounce = yes
unknown_local_recipient_reject_code = 550
virtual_maps = hash:/usr/local/etc/postfix/mx-ingress04.example.org.virtual-virtual_alias_maps
# cat mx-ingress04.example.org.virtual-virtual_alias_maps
postmaster@mx-ingress04.example.org   dan@langille.org
production@mx-ingress04.example.org   ingress_svn@[127.163.0.10]

Since writing this, I’ve found some help via IRC

Solution

This is what I am using now. In short I have:

  1. Removed virtual_maps
  2. Removed relayhost
  3. Added a new alias_maps
  4. Added a new transport_maps

The new alias_maps is:

[root@mx-ingress04 /usr/local/etc/postfix]# cat mx-ingress04-alias-maps
production:	ingress_svn@[127.163.0.10]

This says

  • any mail for local user production gets sent to ingress_svn@[127.163.0.10]

The new transport_maps is:

[root@mx-ingress04 /usr/local/etc/postfix]# cat mx-ingress04-transport
[127.163.0.10] :
* :[10.0.17.21]

This says:

  • [127.163.0.10] will be delivered locally
  • all other mail will be relayed via 10.0.17.21

The current configuration is:

# postconf -n
alias_maps = hash:/etc/mail/aliases, hash:/usr/local/etc/postfix/mx-ingress04-alias-maps
command_directory = /usr/local/sbin
compatibility_level = 2
daemon_directory = /usr/local/libexec/postfix
data_directory = /var/db/postfix
debug_peer_level = 2
debugger_command = PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin ddd $daemon_directory/$process_name $process_id & sleep 5
html_directory = no
inet_interfaces = 127.163.0.25, [redacted]:1a17:f712:6854:4175:9eaa
inet_protocols = ipv4, ipv6
mail_owner = postfix
mailbox_command = /usr/local/bin/maildrop -d ${USER}
mailbox_size_limit = 102400000
mailq_path = /usr/local/bin/mailq
manpage_directory = /usr/local/man
message_size_limit = 102400000
mynetworks_style = host
newaliases_path = /usr/local/bin/newaliases
queue_directory = /var/spool/postfix
readme_directory = no
recipient_delimiter = +
sample_directory = /usr/local/etc/postfix
sendmail_path = /usr/local/sbin/sendmail
setgid_group = maildrop
smtp_tls_CAfile = /usr/local/etc/ssl/ca.cer
smtp_tls_cert_file = /usr/local/etc/ssl/mx-ingress04.example.org.fullchain.cer
smtp_tls_fingerprint_digest = sha1
smtp_tls_key_file = /usr/local/etc/ssl/mx-ingress04.example.org.key
smtp_tls_policy_maps = hash:/usr/local/etc/postfix/tls_policy
smtp_tls_security_level = may
smtp_tls_session_cache_database = btree:/var/db/postfix/smtp_scache
smtpd_tls_CAfile = /usr/local/etc/ssl/ca.cer
smtpd_tls_cert_file = /usr/local/etc/ssl/mx-ingress04.example.org.fullchain.cer
smtpd_tls_fingerprint_digest = sha1
smtpd_tls_key_file = /usr/local/etc/ssl/mx-ingress04.example.org.key
smtpd_tls_security_level = encrypt
smtpd_tls_session_cache_database = btree:/var/db/postfix/smtpd_scache
soft_bounce = yes
transport_maps = hash:/usr/local/etc/postfix/mx-ingress04-transport
unknown_local_recipient_reject_code = 550
[root@mx-ingress04 /usr/local/etc/postfix]# 
Feb 052021
 

This is what runs in an ingress jail:

[root@ingress01 ~]# ps auwwx
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
root        43108  0.0  0.2 11428 2016  -  SsJ  03:12   0:00.00 /usr/sbin/syslogd -s
root        43226  0.0  0.6 51332 6200  -  SsJ  03:12   0:00.01 /usr/local/libexec/postfix/master -w
postfix     43227  0.0  0.6 51384 6292  -  SJ   03:12   0:00.01 pickup -l -t unix -u
postfix     43228  0.0  0.6 51384 6252  -  IJ   03:12   0:00.01 qmgr -l -t unix -u
nagios      43232  0.0  0.4 15956 3792  -  IsJ  03:12   0:00.00 /usr/local/sbin/nrpe3 -c /usr/local/etc/nrpe.cfg -d
ingress_svn 43240  0.0  0.2 11004 1652  -  IsJ  03:12   0:00.00 daemon: ingress_svn[43241] (daemon)
ingress_svn 43241  0.0  0.2 11612 1856  -  SJ   03:12   0:00.01 /bin/sh /usr/local/libexec/freshports-service/ingress_svn.sh
ingress     43250  0.0  0.2 11004 1652  -  IsJ  03:12   0:00.00 daemon: ingress[43251] (daemon)
ingress     43251  0.0  0.2 11576 1844  -  SJ   03:12   0:00.01 /bin/sh /usr/local/libexec/freshports-service/ingress.sh
freshports  43263  0.0  0.2 11004 1652  -  IsJ  03:12   0:00.00 daemon: freshports[43264] (daemon)
freshports  43264  0.0  0.2 11612 1852  -  SJ   03:12   0:00.02 /bin/sh /usr/local/libexec/freshports-service/freshports.sh
root        43283  0.0  0.4 19912 4328  -  IsJ  03:12   0:00.00 /usr/sbin/sshd
root        43291  0.0  0.2 11308 1828  -  SsJ  03:12   0:00.00 /usr/sbin/cron -s
postfix     44431  0.0  0.8 51948 7392  -  IJ   03:12   0:00.01 tlsmgr -l -t unix -u
ingress     46156  0.0  0.2 10676 1600  -  SCJ  03:14   0:00.00 sleep 3
ingress_svn 46157  0.0  0.2 10676 1600  -  SCJ  03:14   0:00.00 sleep 3
freshports  46158  0.0  0.2 10676 1600  -  SCJ  03:14   0:00.00 sleep 3
root        46153  0.0  0.3 13140 3232  6  SJ   03:14   0:00.01 /bin/csh -i
root        46155  0.0  0.4 13264 3480  6  SJ   03:14   0:00.00 bash
root        46159  0.0  0.2 11600 2156  6  R+J  03:14   0:00.00 ps auwwx
[root@ingress01 ~]# 

This runs in a nginx jail:

root@nginx01:/ # ps auwwx
USER         PID %CPU %MEM    VSZ   RSS TT  STAT STARTED    TIME COMMAND
root       44800  0.0  0.2  11376  2060  -  SsJ  03:12   0:00.00 /usr/sbin/syslogd -s
root       44828  0.0  0.9  28232  9280  -  SJ   03:12   0:00.01 /usr/local/sbin/snmpd -p /var/run/net_snmpd.pid -c /usr/local/etc/snmpd.conf -a -r
root       44923  0.0  0.7  51356  6588  -  IsJ  03:12   0:00.01 /usr/local/libexec/postfix/master -w
postfix    44924  0.0  0.7  51304  6540  -  IJ   03:12   0:00.01 pickup -l -t unix -u
postfix    44925  0.0  0.7  51360  6600  -  IJ   03:12   0:00.01 qmgr -l -t unix -u
root       44929  0.0  1.3 173480 12820  -  SsJ  03:12   0:00.02 php-fpm: master process (/usr/local/etc/php-fpm.conf) (php-fpm)
www        44930  0.0  1.3 173560 12816  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44931  0.0  2.0 174252 19912  -  IJ   03:12   0:00.03 php-fpm: pool www (php-fpm)
www        44933  0.0  1.3 173560 12820  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44934  0.0  1.3 173560 12820  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44935  0.0  1.3 173560 12820  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44936  0.0  1.3 173560 12820  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44937  0.0  1.3 173560 12824  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44938  0.0  1.3 173560 12824  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44939  0.0  1.3 173560 12824  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44940  0.0  1.3 173564 12828  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44941  0.0  1.3 173564 12828  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44942  0.0  1.3 173564 12828  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44943  0.0  1.3 173564 12828  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44944  0.0  1.3 173564 12836  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44945  0.0  1.3 173564 12836  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44946  0.0  1.3 173564 12836  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44947  0.0  1.3 173564 12836  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44948  0.0  1.3 173564 12840  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44949  0.0  1.3 173564 12840  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
www        44950  0.0  1.3 173564 12840  -  IJ   03:12   0:00.00 php-fpm: pool www (php-fpm)
nagios     44953  0.0  0.4  15956  3884  -  IsJ  03:12   0:00.00 /usr/local/sbin/nrpe3 -c /usr/local/etc/nrpe.cfg -d
root       44957  0.0  0.7  21160  6400  -  IsJ  03:12   0:00.00 nginx: master process /usr/local/sbin/nginx
www        44958  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
www        44959  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
www        44960  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
www        44961  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
www        44962  0.0  0.9  21400  8488  -  IJ   03:12   0:00.01 nginx: worker process (nginx)
www        44963  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
www        44964  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
www        44967  0.0  0.7  21232  6876  -  IJ   03:12   0:00.00 nginx: worker process (nginx)
freshports 44975  0.0  0.2  10844  1700  -  IsJ  03:12   0:00.00 daemon: fp_listen[44976] (daemon)
freshports 44976  0.0  1.8  29080 17704  -  SJ   03:12   0:00.14 /usr/local/bin/python /usr/local/lib/python3.7/site-packages/fp-listen/fp-listen.py (python3.7)
root       44994  0.0  0.6  19912  5424  -  IsJ  03:12   0:00.00 /usr/sbin/sshd
root       44998  0.0  0.2  11308  2032  -  IsJ  03:12   0:00.00 /usr/sbin/cron -s
root       46118  0.0  0.4  13128  3492  6  SJ   03:13   0:00.01 /bin/csh -i
root       46122  0.0  0.2  11624  2432  6  R+J  03:13   0:00.00 ps auwwx
Jan 312021
 

In this post I’ll talk about how the code intially took the easy way out and how recent changes have tightened that up. In short, the code would raise N messages when just a single notification is sufficient.

Today the full vuxml database was reloaded into FreshPorts. This affects approximately 3454 ports and the marking of 60,436 commits (or revisions) as vulnerable. When you look at a port page, you can see the skulls which indicate a known vuln beside the commits in question (look under Commit History). That makes up for 565,576 unique combinations of commits, ports, and vuxml entries.

After all that is processed (over about 21.5 hours on the current server), we need to flush the cache of for each item that changed (ports, daily summaries, etc).

How cache clearing works

When a port is updated, a trigger on the ports table adds an entry to the cache_clearing_ports. A long running daemon monitors the table waiting for a LISTEN notification. When it arrives, the daemon reads the table and starts processing the entries, clearing the cache.

This is simplified though. Updates occur in many places and each invokes its own trigger. However, for this post, we will deal with only one.

current cache_clearing_ports table

This is the current cache_clearing_ports table:

freshports.org=# \d cache_clearing_ports
                                        Table "public.cache_clearing_ports"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

As you can see, duplicate entries are OK. However, over a single commit, which is how the vuxml data is processed, this can result in a large number of duplicates.

How many duplicates?

Checking here, we have just over 1 million entries.

freshports.org=# select count(*) from cache_clearing_ports;
  count  
---------
 1005407
(1 row)

How unique are they?

Only 3500 ports:

freshports.org=# select count(distinct port_id) from commit_log_ports_vuxml;
 count 
-------
  3454
(1 row)

I recall when I was writing this: duplicates are no big deal. The cache gets cleared twice. So what?

It takes time to read through the cache_clearing_ports. A long time, when you have about 1 million entries.

The trigger

This is the existing trigger:

CREATE OR REPLACE FUNCTION ports_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_port      text;
      l_category  text;
   BEGIN
      IF TG_OP = 'UPDATE' THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.id;

            INSERT INTO cache_clearing_ports (port_id, category, port)
                 VALUES (NEW.id, l_category, l_port);

            NOTIFY port_updated;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS ports_clear_cache ON ports;
CREATE TRIGGER ports_clear_cache
    AFTER UPDATE ON ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_clear_cache();

This is how I propose to change the table. First, I’m creating a new identical table to play with:

freshports.dev=# create table cache_clearing_ports_test (like cache_clearing_ports INCLUDING ALL);
CREATE TABLE

Just what I need. This creates a new table, including all the constraints etc, so I could play with it without affecting the ongoing database operation in my dev server.

This is the DDL I would run:

ALTER TABLE public.cache_clearing_ports_test
    ADD CONSTRAINT cache_clearing_ports_test_port_id_idx UNIQUE (port_id);

COMMENT ON CONSTRAINT cache_clearing_ports_port_id_idx ON public.cache_clearing_ports_test
    IS 'Let''s store just one instance per port_id';

The DDL for the real table would be:

ALTER TABLE public.cache_clearing_ports
    ADD CONSTRAINT cache_clearing_ports_port_id_idx UNIQUE (port_id);

COMMENT ON CONSTRAINT cache_clearing_ports_port_id_idx ON public.cache_clearing_ports
    IS 'Let''s store just one instance per port_id';

This is the amended test table:

freshports.dev=# \d cache_clearing_ports_test
                                      Table "public.cache_clearing_ports_test"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_test_pkey" PRIMARY KEY, btree (id)
    "cache_clearing_ports_port_id_idx" UNIQUE CONSTRAINT, btree (port_id)

freshports.dev=# 

Next step, fix the trigger, which will produce duplicate errors if left as it is.

The new trigger

The INSERT command has a conflict_target clause, which is very useful for this situation.

CREATE OR REPLACE FUNCTION ports_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_port      text;
      l_category  text;
   BEGIN
      IF TG_OP = 'UPDATE' THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.id;

            INSERT INTO cache_clearing_ports (port_id, category, port)
                 VALUES (NEW.id, l_category, l_port)
            ON CONFLICT ON CONSTRAINT cache_clearing_ports_port_id_idx DO NOTHING;

            NOTIFY port_updated;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

What this says:

  • On the INSERT
  • IF there is a conflict on the unique port_id index
  • THEN do nothing

There are many other things you can do ON CONFLICT, but in my case, doing nothing is what I want.

But wait! There’s more!

I went one step farther. Why NOTIFY if we don’t have to.

CREATE OR REPLACE FUNCTION ports_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_port      text;
      l_category  text;
      l_row_count bigint;
   BEGIN
      IF TG_OP = 'UPDATE' THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.id;

            INSERT INTO cache_clearing_ports (port_id, category, port)
                 VALUES (NEW.id, l_category, l_port)
            ON CONFLICT ON CONSTRAINT cache_clearing_ports_port_id_idx DO NOTHING;

            GET DIAGNOSTICS l_row_count = ROW_COUNT;
            IF l_row_count > 0 THEN
                  NOTIFY port_updated;
            END IF;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

In this solution, the code checks to see if anything was inserted. If so, it issues the NOTIFY.