Sep 262018
 

This post serves two goals:

  1. document the CONFLICTS implementation within FreshPorts
  2. show how you can use triggers to update a secondary table

The FreeBSD ports tree / package collection has a method for resolving conflicts. This feature allows a port maintainer to specify what, if any, packages conflict with a given port. The reasons why two ports might conflict is outside the scope of this article. For background reading:

Obtaining the values from the Makefile

To get started, let’s look at what I already have in the development instance of the FreshPorts database.

freshports.dev=# SELECT id, packagename(id), conflicts, conflicts_build, conflicts_install FROM ports WHERE coalesce(conflicts, conflicts_build, conflicts_install) IS NOT NULL and packagename(id) like 'bacula%' order by last_commit_id desc limit 10
;
  id   |      packagename      |            conflicts             | conflicts_build | conflicts_install 
-------+-----------------------+----------------------------------+-----------------+-------------------
 45084 | bacula9-docs          | bacula-* bacula5-*               |                 | 
 45086 | bacula9-server        | bacula5-server-* bacula-server-* |                 | 
 45087 | bacula9-bat           | bacula5-server-* bacula-server-* |                 | 
 45083 | bacula9-client        | bacula5-server-* bacula-server-* |                 | 
 45085 | bacula9-client-static | bacula5-server-* bacula-server-* |                 | 
(5 rows)

freshports.dev=# 

The packagename corresponds to the following FreeBSD ports:

  1. bacula9-docs
  2. bacula9-server
  3. bacula9-bat
  4. bacula9-client
  5. bacula9-client-static

In this case, bacula9 ports conflict with older versions of Bacula. How can we get those values from the Makefile? Let’s get the values for bacula9-server. Assume these commands are done in the /var/db/freshports/ports-jail/var/db/repos/PORTS-head/sysutils/bacula9-server directory, which is a non-standard location for ports files, but it suites the needs of the website.

$ grep CONFLICTS *
Makefile:CONFLICTS?=	bacula5-server-* bacula-server-*

NOTE: grep is not an appropriate method for obtaining the values because the values may be contingent upon other configuration settings.

This is the proper method:

$ make -V CONFLICTS -V CONFLICTS_BUILD -V CONFLICTS_INSTALL
bacula5-server-* bacula-server-*

NOTE: those two blank lines correspond to the empty values contained in those files.

globs not regex

CONFLICTS can involve wildcards, such as the * shown above. It is critical to realize that these are globs, meant for matching disk filenames, as opposed to regular expressions.

From the documentation

# CONFLICTS             - A list of package name patterns that the port conflicts
#                                 with, separated by blanks.  The names may include shell
#                                 pattern meta-characters "*", "?", "[", "]", and "!".
#                                 Example: apache*-1.2* apache*-1.3.[012345] apache-*+ssl_*

PostgreSQL has many regex functions. Let’s see about converting globs to regex.

glob to regex

Perl to the rescue. I received some help, as many people do, from RhodiumToad, who supplied this function:

CREATE TRUSTED PROCEDURAL LANGUAGE plperl
    HANDLER plperl_call_handler
    INLINE plperl_inline_handler
    VALIDATOR plperl_validator;
    
COMMENT ON LANGUAGE plperl
    IS 'PL/Perl procedural language';

CREATE FUNCTION public.glob_to_regex(text) RETURNS text
    LANGUAGE plperl
    AS $_$
    my $w = quotemeta shift;
    $w =~ s{\\([*?]|\\(\\?.)|\[(?:\\\!)?(?:\\\])?[^]]*\\\])}
                   {($1 eq '*' ? '.*' :
                         $1 eq '?' ? '.'  :
                         defined($2) ? $2 :
                         do { my $p = $1; $p =~ s/\\([][^-])/$1/g; $p =~ s/^\[\\!/[^/; $p; })}eg;
    return $w;
$_$;

On FreeBSD, I needed to add the databases/postgresql10-server package to provide support for Perl language extensions.

This will take a glob and translate it into a regular expression.

Here are a few examples:

freshports.dev=# select glob_to_regex('bacula-*');
 glob_to_regex 
---------------
 bacula\-.*
(1 row)

freshports.dev=# 


freshports.dev=# select glob_to_regex('postgresql9[^4]*');
   glob_to_regex   
-------------------
 postgresql9[^4].*
(1 row)

freshports.dev=# select glob_to_regex('clang-devel-3.[1234567]*');
        glob_to_regex         
------------------------------
 clang\-devel\-3\.[1234567].*
(1 row)

freshports.dev=# 

The secondary table

As part of earlier work, the ports table was upgraded to add three new fields:

ALTER TABLE ports ADD COLUMN conflicts text;
ALTER TABLE ports ADD COLUMN conflicts_build text;
ALTER TABLE ports ADD COLUMN conflicts_install text;

These columns will store the raw values as obtained directly from the Makefile. The corresponding regex values will be store, one per row, in another table. That table looks like this:

CREATE TABLE ports_conflicts
(
    id BIGSERIAL,
    port_id integer          NOT NULL,
    conflicts_type conflicts NOT NULL,
    item_num integer         NOT NULL,
    item_value text          NOT NULL,
    CONSTRAINT ports_conflicts_pkey PRIMARY KEY (id),
    CONSTRAINT ports_conflicts_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

What’s that column type on line 5 you might ask.

CREATE TYPE public.conflicts AS ENUM
    ('conflicts', 'conflicts_build', 'conflicts_install');

The item_num column will help to order the values so they correspond to what we find in the the source value. Feedback regarding the accuracy of that claim is most welcome. The ordering is not vital, but it does help to make sure it is consistent when we pull back the values later.

The Trigger

This is the trigger which will populate the ports_conflicts table.

CREATE OR REPLACE FUNCTION ports_conflicts_set() RETURNS TRIGGER AS $$
  BEGIN
    DELETE FROM ports_conflicts WHERE port_id = NEW.id;

    INSERT INTO ports_conflicts(port_id, conflicts_type, item_num, item_value)
      SELECT NEW.id                         AS port_id, 
             'conflicts'::conflicts         AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_build'::conflicts   AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_build, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_install'::conflicts AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_install, E'\\s+') AS R(item_value);

    RETURN NEW;
  END
$$ LANGUAGE 'plpgsql';


  DROP TRIGGER IF EXISTS ports_conflicts_insert ON ports;
CREATE TRIGGER ports_conflicts_insert
    AFTER INSERT on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

  DROP TRIGGER IF EXISTS ports_conflicts_update ON ports;
CREATE TRIGGER ports_conflicts_update
    AFTER UPDATE on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

Line 3 : deletes existing entries for this port

Lines 6-10 : compiles a list of conflicts values

Lines 14-18 : does the same thing for conflicts_build values

Lines 22-26 : same again, for conflicts_install

The UNIONs tie them all together and line 5 puts it into the ports_conflicts table.

Details on the trigger

This section explains the trigger in a bit more detail.

The regexp_split_to_table function is a wonderful solution to break each CONFLICTS value into separate rows, like this:

freshports.dev=# select regexp_split_to_table('bacula5-server-* bacula-server-*',  E'\\s+');
 regexp_split_to_table 
-----------------------
 bacula5-server-*
 bacula-server-*
(2 rows)

freshports.dev=# 

The glob_to_regex(R.item_value) call changes each of the above items from a blog to a regex.

row_number() OVER (ORDER BY item_value) will sort individual value within a conflict and give us a value for the ports_conflicts.item_num field.

Matching against the regex

The list of conflicts for a given port will not change until the next commit for a port.

However, what that conflict matches against might change when another port is updated.

Thus, the ports_conflicts table only updates based upon a trigger on the ports table.

We will store the actual matches in the ports_conflicts_matches table.

CREATE TABLE ports_conflicts_matches
(
    ports_conflicts_id bigint NOT NULL,
    port_id integer NOT NULL,
    CONSTRAINT ports_conflicts_matches_pkey PRIMARY KEY (ports_conflicts_id, port_id),
    CONSTRAINT ports_conflicts_matches_conflicts_id FOREIGN KEY (ports_conflicts_id)
        REFERENCES public.ports_conflicts (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT ports_conflicts_matches_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

How do we populate that table? How can it be quickly done?

That is the part which took the longest time to figure out.

Simple population

Here are some sample CONFLICTS values from the early testing (as recorded in an older gist):

freshports.dev=# select * from ports_conflicts;
 id | port_id |  conflicts_type   | item_num |          item_value          
----+---------+-------------------+----------+------------------------------
 21 |   43574 | conflicts_install |        4 | mysqlwsrep.*
 22 |   43574 | conflicts_install |        3 | mysql[0-9].*\-server\-.*
 23 |   43574 | conflicts_install |        1 | mariadb10[013-9]\-server\-.*
 24 |   43574 | conflicts_install |        5 | percona[0-9].*\-server\-.*
 25 |   43574 | conflicts_install |        2 | mariadb5.*\-server\-.*
(5 rows)

freshports.dev=# 

Given the regex in ports_conflicts.item_value, let’s find the matching values from the commit_log_ports table. I can concatenate all the regex values into one search term. This approach works rather well:

freshports.dev=# explain analyse
freshports.dev-# SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
                                                                                     QUERY PLAN                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=19482.04..19482.68 rows=85 width=22) (actual time=497.597..497.856 rows=364 loops=1)
   ->  Sort  (cost=19482.04..19482.25 rows=85 width=22) (actual time=497.596..497.658 rows=852 loops=1)
         Sort Key: port_id, port_name_revision
         Sort Method: quicksort  Memory: 91kB
         ->  Gather  (cost=1000.00..19479.32 rows=85 width=22) (actual time=107.422..497.400 rows=852 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on commit_log_ports clp  (cost=0.00..18470.82 rows=35 width=22) (actual time=108.076..493.249 rows=284 loops=3)
                     Filter: (port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)'::text)
                     Rows Removed by Filter: 307469
 Planning time: 0.417 ms
 Execution time: 498.383 ms
(12 rows)

freshports.dev=# 

The results are something like this:

SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
 
  port_id |      port_name_revision       
---------+-------------------------------
   16965 | mysql51-server-5.1.11
   16965 | mysql51-server-5.1.12
   16965 | mysql51-server-5.1.14
   16965 | mysql51-server-5.1.15
   16965 | mysql51-server-5.1.16
   16965 | mysql51-server-5.1.17
   16965 | mysql51-server-5.1.18
   16965 | mysql51-server-5.1.19
   16965 | mysql51-server-5.1.20
   16965 | mysql51-server-5.1.21
   16965 | mysql51-server-5.1.22
...

Only a few rows of the results are shown above. The full list of matches is in that gist.

Yes, this works, however, that solution takes nearly half a second and caters for just set of one match. There must be a better way. If you keep reading in that gist, you will see the other methods I tried. I want to thank those on IRC who helped me find different ways to do this.

Next, I tried an SRF (Set Returning Function) and documented that in this gist. This one is fast; the test I ran taking 7.137 ms, but it did not scale, and exploded to 4825.074 ms when I tried multiple rows.

SRF within a function

My next attempt involved a plpgsql function which took only 13.180 ms, a huge speed increase. This turned out to be the solution I implemented.

Let’s review slightly using the final gist. My original function was:

CREATE OR REPLACE FUNCTION port_conflicts4(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  SELECT DISTINCT PC.id, CLP.port_id
    FROM commit_log_ports CLP JOIN ports_conflicts PC 
                              ON (PC.id = $1 AND CLP.port_name_revision ~ $2);
$$ LANGUAGE SQL STRICT;

Then I created a SRF using plpgsql, which is incredibly fast:

CREATE OR REPLACE FUNCTION port_conflicts(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  DECLARE
    a_ports_conflicts_id ALIAS FOR $1;
    a_regex              ALIAS FOR $2;

    l_regex text;
    r       ports_conflicts_matches%rowtype;
  BEGIN
    l_regex := '^(' || a_regex || ')';
    
    FOR r in SELECT distinct a_ports_conflicts_id, port_id FROM commit_log_ports WHERE port_name_revision ~ l_regex
    LOOP
      RETURN NEXT r;
    END LOOP;
  END;

$$ LANGUAGE 'plpgsql' STABLE;

Now let’s try that function out:

freshports.dev=# explain analyse SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts5(PC.id, '^' || PC.item_value || '$')) PC4;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.26..35052.76 rows=1751000 width=12) (actual time=0.594..2208.947 rows=10700 loops=1)
   ->  Seq Scan on ports_conflicts pc  (cost=0.00..32.51 rows=1751 width=26) (actual time=0.015..0.838 rows=1751 loops=1)
   ->  Function Scan on port_conflicts5  (cost=0.26..10.26 rows=1000 width=12) (actual time=1.258..1.259 rows=6 loops=1751)
 Planning time: 0.112 ms
 Execution time: 2209.598 ms
(5 rows)

freshports.dev=# 

2.2 seconds, to calculate conflicts for 1751 rows, resulting in 10700 rows. That’s fast enough. We could do that on a periodic basis if required.

From my personal development ethics, I don’t like the idea of having a function and a table with the same / very similar name (port_conflicts function vs ports_conflicts table). I might rename the function to find_matches_for_conflict. I think that better describes the purpose of the function.

Populate the destination table

freshports.dev=# insert into ports_conflicts_matches SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts(PC.id, '^' || PC.item_value || '$')) PC4;
INSERT 0 10700

Now let’s have a look at some of the results:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
freshports.dev-#   FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
freshports.dev-#  WHERE PC.port_id = 45086;
  packagename   |    ?column?    |     packagename     
----------------+----------------+---------------------
 bacula9-server | conflicts with | bacula5-server
 bacula9-server | conflicts with | bacula-server
 bacula9-server | conflicts with | bacula-server-devel
(3 rows)

freshports.dev=# 

You can find that magical 45086 value near the top of this post.

Let’s try openssl:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 954;
 packagename |    ?column?    |  packagename   
-------------+----------------+----------------
 openssl     | conflicts with | openssl111
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
(14 rows)

Or heimdal:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 934;
 packagename |    ?column?    | packagename  
-------------+----------------+--------------
 heimdal     | conflicts with | wu-ftpd
 heimdal     | conflicts with | wu-ftpd+ipv6
 heimdal     | conflicts with | srp
 heimdal     | conflicts with | krb5-maint
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-114
 heimdal     | conflicts with | krb5-111
 heimdal     | conflicts with | krb5-113
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-112
 heimdal     | conflicts with | krb5-116
 heimdal     | conflicts with | krb5-115
 heimdal     | conflicts with | krb4
(14 rows)

freshports.dev=# 

Now partially in production

The work is partially implemented in production. The idea came from this issue suggested by zi0r.

If you have a look at the Bacula port, you will find the following under Conflicts:

CONFLICTS:

  • bacula5-server-*
  • bacula-server-*

Future work

The next job is to automatically populate the ports_conflicts_matches table. I think that needs to be done on a periodic basis, perhaps every 10 minutes if there has been a recent commit.

Just thinking about that now:

  1. If port A conflicts with port B, then clearly port B conflicts with port A. I am not sure if there is a requirement to specify CONFLICTS in both ports, but that is something easily confirmed by the ports_conlicts table.
  2. I tried to think of commits which would not change ports_conflicts_matches. Since the matches can include PORTVERSION, any given commit which bumps that value can affect matches.
  3. Perhaps I could exclude commits to UPDATING, MOVED, or Mk, but they are relatively few compared to the rest.
  4. I think it’s just safer to recompute all matches periodically. There does not seem to be a safe and straight forward way to do it based upon a smaller unit of work.
Mar 112018
 

Databases use relational integrity to enforce expected situations. A common scenario is duplicates. Case in point, I present the port_dependencies table:

freshports.org=# \d port_dependencies
                    Table "public.port_dependencies"
         Column         |     Type     | Collation | Nullable | Default 
------------------------+--------------+-----------+----------+---------
 port_id                | integer      |           | not null | 
 port_id_dependent_upon | integer      |           | not null | 
 dependency_type        | character(1) |           | not null | 
Indexes:
    "port_dependencies_pkey" PRIMARY KEY, btree (port_id, port_id_dependent_upon, dependency_type)
Foreign-key constraints:
    "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
    port_dependencies_delete_clear_cache AFTER DELETE ON port_dependencies FOR EACH ROW EXECUTE PROCEDURE port_dependencies_delete_clear_cache()
    port_dependencies_insert_clear_cache AFTER INSERT ON port_dependencies FOR EACH ROW EXECUTE PROCEDURE port_dependencies_insert_clear_cache()

freshports.org=# 

For those not familiar with FreeBSD ports, each port (you could also refer to them as a package or application) can have zero or more dependencies. The FreshPorts database extracts and lists these dependencies for convenient viewing.

As you can see in the above table definition, for a specific (port_id), a given dependency type (port_id_dependent_upon, dependency_type) can only occur once. There is no reason for a port to be dependent upon something twice.

Or so you, or I, might think.

Then this happened:

Could not execute SQL SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result ... 
maybe invalid? ERROR:  duplicate key value violates unique constraint "port_dependencies_pkey"

It was part of this commit on net/ceph. Let’s find out where FreshPorts went wrong.

Clearly, something tried to record a duplicate dependency. The code doesn’t look for it, and did not detect the violation.

The log file

FreshPorts logs a lot of stuff, because it can. It has always been useful for situations such as this.

Here is what I found:

depends with this: 'libboost_python.so:devel/boost-python-libs@py27 libboost_python.so:devel/boost-python-libs
libboost_thread.so:devel/boost-libs libleveldb.so:databases/leveldb libnss3.so:security/nss
libcryptopp.so:security/cryptopp libsnappy.so:archivers/snappy libcurl.so:ftp/curl libxml2.so:textproc/libxml2
libexpat.so:textproc/expat2 liblz4.so:archivers/liblz4 libplds4.so:devel/nspr libtcmalloc.so:devel/google-perftools
libfuse.so:sysutils/fusefs-libs libintl.so:devel/gettext-runtime libldap-2.4.so.2:net/openldap24-client'

The next entry referred to:

The 'L' depends are: devel/boost-python-libs@py27 - devel/boost-python-libs - devel/boost-libs - 
databases/leveldb - security/nss - security/cryptopp - archivers/snappy - ftp/curl - textproc/libxml2 - 
textproc/expat2 - archivers/liblz4 - devel/nspr - devel/google-perftools - sysutils/fusefs-libs -
devel/gettext-runtime - net/openldap24-client

Then the code starts processing them, one by one:

adding in devel/boost-python-libs@py27 which converts to 'devel/boost-python-libs'
sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
result is 1

As you can see, it strips the flavor (@py27) from the dependency. I think I know where this is going.

And then the next one:

adding in devel/boost-python-libs which converts to 'devel/boost-python-libs'
sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
Could not execute SQL SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result ...
maybe invalid? ERROR:  duplicate key value violates unique constraint "port_dependencies_pkey"
DETAIL:  Key (port_id, port_id_dependent_upon, dependency_type)=(45100, 25346, L) already exists.
CONTEXT:  SQL statement "INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
            VALUES (l_PortID, l_PortIDDependency, p_DependencyType)"
PL/pgSQL function portsdependenciesadd(text,text,text) line 19 at SQL statement
that failed

Ahh, OK, that’s why. I see now.

Let’s go to the source.

The source

This isn’t the source, it’s the Makefile from that commit. Looking at lines 25-28 we see:

25	LIB_DEPENDS=    \
26	        ${PY_BOOST} \
27	        libboost_python.so:devel/boost-python-libs \
28	        libboost_thread.so:devel/boost-libs \

I bet it’s coming from here:

$ make -V PY_BOOST
libboost_python.so:devel/boost-python-libs@py27

Yes, that’s it. Now we know where and why the duplicates are arising.

How to fix this

I could take one of two approaches:

  1. Detect and eliminate duplicates within the code.
  2. Within the database, ignore attempts to add duplicates.

The database is wonderful at doing this. I could write code, and it might be buggy. I trust the database for this, much more than I trust my code. Plus, I’ll have less code to maintain.

I’ll take option #2.

The code

Here is the statement which inserts the tuple into the database:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
      VALUES (l_PortID, l_PortIDDependency, p_DependencyType);

There is no detection of duplicate rows.

In different situations, I have used this approach to detect duplicates during insertion:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
  SELECT l_PortID, l_PortIDDependency, p_DependencyType
   WHERE NOT EXISTS (SELECT port_id, port_id_dependent_upon, dependency_type
                       FROM port_dependencies
                      WHERE port_id                = l_PortID
                        AND port_id_dependent_upon = l_PortIDDependency
                        AND dependency_type        = p_DependencyType);

The above may be easier to follow if I do the variable substitution:

 INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
  SELECT 45100, 25346, 'L'
   WHERE NOT EXISTS (SELECT port_id, port_id_dependent_upon, dependency_type
                       FROM port_dependencies
                      WHERE port_id                = 45100
                        AND port_id_dependent_upon = 25346
                        AND dependency_type        = 'L');

That works well, exactly as expected. It does what I need.

But there is a better way

Then I posted on Twitter about this, with the above example. My cousin, Scott Walsh, mentioned ON CONFLICT. This feature arrived with PostgreSQL 9.5 (see docs), which was released in early 2016. I haven’t kept up. FreshPorts added dependency support in 2011.

I immediately tried this approach:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
freshports.org-#   SELECT 45100, 25346, 'L'
freshports.org-# ON CONFLICT DO NOTHING;
INSERT 0 0

That’s great.

Then I got to thinking… there are other conflicts which could occur, errors which should be brought to my attention. Foreign key violations come to mind.

Reading more of the documentation came up with this solution:

freshports.org=# INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
freshports.org-#   SELECT 45100, 25346, 'L'
freshports.org-# ON CONFLICT ON CONSTRAINT port_dependencies_pkey DO NOTHING;
INSERT 0 0

EDIT 2017.03.17

I see I’ve left my SELECT in that INSERT. The following is what I will use in production:

INSERT INTO port_dependencies (port_id, port_id_dependent_upon, dependency_type)
    VALUES (l_PortID, l_PortIDDependency, p_DependencyType)
    ON CONFLICT ON CONSTRAINT port_dependencies_pkey DO NOTHING;

When running my first test of that code, this is what was found in the logs (two duplicate inserts, no failures):

sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/net/ceph
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/devel/boost-python-libs
result is 1
adding in devel/boost-python-libs which converts to 'devel/boost-python-libs'
sql is SELECT PortsDependenciesAdd( 'net/ceph', 'devel/boost-python-libs', 'L' ) as result
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/net/ceph
NOTICE:   branch is head
NOTICE:   GetPort pathname is=/ports/head/devel/boost-python-libs
result is 1

That’s exactly what I need. Phew.


This code deals specifically with one constraint, port_dependencies_pkey. If that constraint is violated, we have a duplicate key: do nothing, ignore it, as if nothing happened. Please move along, nothing to see here, thank you.

This, for my situation, is exactly what I want. Instead of removing duplicates in the front end code, I can let the database silently handle it. Score.

Thanks to those who helped me find this solution. I hope it helps you too. This is not a one-size-fits-all solution, but it might be applicable to your needs.

Jul 252016
 

FreshPorts uses a cache for every port page (e.g. https://www.freshports.org/sysutils/bacula-server/). When an update to a port occurs, we must clear the cache for that port. This sounds simple, and it is. The various ways in which updates occur complicates the situation.

This post is all about fixing one edge case: the addition or removal of a port dependency (e.g. RUN_DEPENDS, LIB_DEPENDS).

When port A depends on port B, this fact is listed on both pages, but adding/removing a dependency is not properly handled (see this bug). Tonight, I have coded a fix. I’d like it to be less complex, but it has been fixed.

The table

This is the table in question:

freshports.org=# \d port_dependencies
         Table "public.port_dependencies"
         Column         |     Type     | Modifiers 
------------------------+--------------+-----------
 port_id                | integer      | not null
 port_id_dependent_upon | integer      | not null
 dependency_type        | character(1) | not null
Indexes:
    "port_dependencies_pkey" PRIMARY KEY, btree (port_id, port_id_dependent_upon, dependency_type)
Foreign-key constraints:
    "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

Using the terminology from the first sentence, port A and port B are represented by port_id and port_id_dependent_upon respectively. Any new row or any deleted row must invalidate the cache for both port A and port B.

Keeping track of cache to clear

This is the table we use to clear the ports cache:

freshports.org=# \d cache_clearing_ports
                                     Table "public.cache_clearing_ports"
   Column   |            Type             |                             Modifiers                             
------------+-----------------------------+-------------------------------------------------------------------
 id         | integer                     | not null default nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     | not null
 category   | text                        | not null
 port       | text                        | not null
 date_added | timestamp without time zone | not null default now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

We add rows to this table to indicate that a given port must be cleared from the cache.

On INSERT

On an insert to the port_dependencies table, let’s do this:

CREATE OR REPLACE FUNCTION port_dependencies_insert_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_cache_clearing_ports_id   int8;
      l_port      text;
      l_category  text;
   BEGIN
        --
        -- This function handles the addition of a new dependency.
        -- yes, we need to clear the cache for both port_id and port_id_dependent_upon
        -- from the cache_clearing_ports.  I figure there is a shorter way to this but
        -- I cannot think it through right now.
        --

        IF TG_OP = 'INSERT' THEN
         -- handle port A (port_id)
         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = NEW.port_id;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.port_id;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (NEW.port_id, l_category, l_port);
          END IF;

         -- handle port B (port_id_dependent_upon)
         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = NEW.port_id_dependent_upon;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = NEW.port_id_dependent_upon;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (NEW.port_id_dependent_upon, l_category, l_port);
          END IF;

          NOTIFY port_updated;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS port_dependencies_insert_clear_cache ON port_dependencies;
CREATE TRIGGER port_dependencies_insert_clear_cache
    AFTER INSERT on port_dependencies
    FOR EACH ROW
    EXECUTE PROCEDURE port_dependencies_insert_clear_cache();

That function is rather long, and I know it can be simplified, but that’s not on tonight’s agenda. If you are so inclined, I am happy to hear your suggestions. Specifically, I think I can do an INSERT INTO … WHERE NOT EXISTS and do the SELECT & INSERT in one step, without the need for IF FOUND.

On DELETE

Here is the code for the DELETE.

CREATE OR REPLACE FUNCTION port_dependencies_delete_clear_cache() RETURNS TRIGGER AS $$
   DECLARE
      l_cache_clearing_ports_id   int8;
      l_port      text;
      l_category  text;
   BEGIN
        --
        -- This function handles the deletion of a existing dependency.
        -- yes, we need to clear the cache for both port_id and port_id_dependent_upon
        -- from the cache_clearing_ports.  I figure there is a shorter way to this but
        -- I cannot think it through right now.
        --

        IF TG_OP = 'DELETE' THEN
         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = OLD.port_id;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = OLD.port_id;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (OLD.port_id, l_category, l_port);
          END IF;

         SELECT port_id
           INTO l_cache_clearing_ports_id
           FROM cache_clearing_ports
          WHERE port_id = OLD.port_id_dependent_upon;

          IF NOT FOUND THEN
            SELECT category, name
              INTO l_category, l_port
              FROM ports_all
             WHERE id = OLD.port_id_dependent_upon;
            INSERT INTO cache_clearing_ports (port_id, category, port)
            VALUES (OLD.port_id_dependent_upon, l_category, l_port);
          END IF;

          NOTIFY port_updated;
      END IF;

      -- when a port changes, add an entry to the cache clearing table
      RETURN OLD;
   END
$$ LANGUAGE 'plpgsql';

  DROP TRIGGER IF EXISTS port_dependencies_delete_clear_cache ON port_dependencies;
CREATE TRIGGER port_dependencies_delete_clear_cache
    AFTER DELETE on port_dependencies
    FOR EACH ROW
    EXECUTE PROCEDURE port_dependencies_delete_clear_cache();

It is nearly identical to the first function. Again, those of you who want to suggest improvements there, feel free.

I will run this in dev for a while and see how it goes. Initial testing has been promising.

NOTE: there are no updates to this table, by design. Instead when we see a new commit to a port, we do a DELETE from port_dependencies WHERE port_id = :a.

Jan 292014
 

Yesterday, I wrote about some great progress on getting FreshPorts to work with multiple repositories to cater for the branches of the FreeBSD ports tree. At the end of that post, I talked briefly about wanting global settings or variables for my stored procedures. Shortly after posting that, I found a solution in the form of a post by depesz.

A bit about the problem

The problem I encountered was one of context. Sometimes, you want to use trunk (or head) and sometimes you want to use a branch (e.g. RELENG_9_1_0). A function such as the one shown below, has no content. It has ‘head’ hardcoded (see line 8) because the FreeBSD ports tree had no branches when the function was first modified to cater for SVN (FreeBSD previously used CVS).

CREATE OR REPLACE FUNCTION GetPort(text) RETURNS int4 AS $$
   DECLARE
      category_port ALIAS for $1;
      pathname         text;
      port_element_id  int4;
      port_id          int4;
   BEGIN
      pathname        := '/ports/head/' || category_port;
      port_element_id := Pathname_ID(pathname);
      if port_element_id IS NOT NULL THEN
         select id
           into port_id
           from ports
          where element_id = port_element_id;
      END IF;
      return port_id;
   END;
$$ LANGUAGE 'plpgsql';

If you look at yesterday’s post, you’ll see this output:

freshports.org=# select id, name, category, element_id, element_pathname(element_id) from ports_active where name = 'spellathon';
  id   |    name    | category | element_id |               element_pathname                
-------+------------+----------+------------+-----------------------------------------------
 24964 | spellathon | games    |     324899 | /ports/head/games/spellathon
 34159 | spellathon | games    |     559872 | /ports/branches/RELENG_9_1_0/games/spellathon
(2 rows)
 
freshports.org=# 

We need some way to tell this function which branch we are using.

The solution

This is the amended function, which is working just fine, thank you depesz.

CREATE OR REPLACE FUNCTION GetPort(text) RETURNS int4 AS $$
   DECLARE
      category_port ALIAS for $1;
      pathname         text;
      port_element_id  int4;
      port_id          int4;
      l_branch         text;
   BEGIN
      l_branch := freshports_branch_get();

      IF l_branch = 'head' THEN
          pathname := '/ports/'          || l_branch || '/' || category_port;
      ELSE
          pathname := '/ports/branches/' || l_branch || '/' || category_port;
      END IF;

      port_element_id := Pathname_ID(pathname);
      IF port_element_id IS NOT NULL THEN
         SELECT id
           INTO port_id
           FROM ports  
          WHERE element_id = port_element_id;
      END IF;

      RETURN port_id;
   END;
$$ LANGUAGE 'plpgsql';

The new magic code appears on lines 9-15.

  • Line 9-10 calls a new function (shown later) which pulls back the branch.
  • The IF statement constructs the correct path, based on the differences between trunk and branches in the FreeBSD ports repository.

Show me it working!

I have coded the function to assume head if no branch is specified. That is the default in the current code. Here is what you get when you invoke GetPort without specifying a branch:

freshports.org=# select GetPort('games/spellathon');
 getport 
---------
   24964
(1 row)

freshports.org=# 

Next, I set the branch, and call the same function again:

freshports.org=# select freshports_branch_set('RELENG_9_1_0');
 freshports_branch_set 
-----------------------
 
(1 row)

freshports.org=# select GetPort('games/spellathon');
 getport 
---------
   34159
(1 row)

freshports.org=# 

And here we swap back again:

freshports.org=# select freshports_branch_set('head');
 freshports_branch_set 
-----------------------
 
(1 row)

freshports.org=# select GetPort('games/spellathon');
 getport 
---------
   24964
(1 row)

freshports.org=# 

I was pretty chuffed when I was able to get this working with a minimum of fuss. This bodes well for the rest of this project.

The rest of the code

In addition to the code provided in depesz’s post, I added the following helper functions, specific to FreshPorts.

CREATE OR REPLACE FUNCTION freshports_branch_set( TEXT ) RETURNS void as $$

   SELECT session_variables.set_value('branch', $1);

$$ language sql;


CREATE OR REPLACE FUNCTION freshports_branch_get() RETURNS TEXT as $$

DECLARE
    reply TEXT;

BEGIN

   reply := session_variables.get_value( 'branch' );

   IF reply IS NULL THEN
      reply := 'head';
   END IF;

   RETURN reply;

END;

$$ language plpgsql;

The first function lets my code set the branch. You saw me use that in the examples above. The second function is used by the GetPort() function to retrieve the branch it should use. By default, it will return ‘head’.

Let’s go!

This session variable stuff is pretty much what I was looking for last night. I’m very pleased that it has been so easy to add to my particular application. Hope it helps you. I’m looking forward to the rest of this upgrade.

Jan 252014
 

Earlier today, I wrote about a fix which broke search. At the end of that post, I mentioned a few things which needed to be done to fix up the broken relationships.

  1. clean up the database
  2. add a foreign key, on delete set null
  3. trigger to find the right value when set null

Let’s get started.

clean up the database

How many rows are we talking about?

SELECT count(*)
  FROM ports P
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
 13023

Ouch. That’s about half the ports tree.

Let’s set them all NULL.

BEGIN;
UPDATE ports P
   set last_commit_id = NULL
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
UPDATE 13023

-- that number matches up
-- now let's run that first query again:
SELECT count(*)
  FROM ports P
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
 13023
(1 row)

-- oh, of course.  Let's exclude what we just fixed

SELECT count(*)
  FROM ports P
 WHERE P.last_commit_id IS NOT NULL
   AND NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
     0
(1 row)

-- this looks good; commit;

commit;
COMMIT

Now, let’s fix up all the NULL keys. Fortunately, I have a script which fixes this exact situation.

#!/usr/bin/perl -w
#
# $Id: set-last-commit-id.pl,v 1.2 2006-12-17 12:04:06 dan Exp $
#
# Copyright (c) 1999-2004 DVL Software
#

#
# I found that several (about 800-900) ports did not yet have last_commit_id set.
# this script sets that.
#

use strict;
use lib "$ENV{HOME}/scripts";
use port;
use DBI;
use database;
use utilities;

my $dbh;

my $porttorefresh;
my @PORTS;
my $sql;
my $sth;
my @row;

FreshPorts::Utilities::InitSyslog();

$dbh = FreshPorts::Database::GetDBHandle();

#
# get a list of ports to update
#

$sql = "
  SELECT id
    FROM ports
   WHERE last_commit_id IS NULL
";

print "sql = $sql\n";

$sth = $dbh->prepare($sql);
$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

while (@row=$sth->fetchrow_array) {
	print "now reading @row\n";
	push @PORTS, "$row[0]"
}

my $port = FreshPorts::Port->new($dbh);

foreach $porttorefresh (@PORTS) {
	$sql = "
SELECT max(CL.commit_date)
  FROM commit_log CL, commit_log_ports CLP
 WHERE CLP.port_id = $porttorefresh
   AND CL.id       = CLP.commit_log_id
";

	print "sql = $sql\n";

	$sth = $dbh->prepare($sql);
	$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

	@row =$sth->fetchrow_array;

	my $CommitLogDate = $row[0];

	$sql = "
SELECT CL.id
  FROM commit_log CL, commit_log_ports CLP
 WHERE CLP.port_id = $porttorefresh
   AND CL.id       = CLP.commit_log_id
   AND CL.commit_date = '$CommitLogDate'
";

	$sth = $dbh->prepare($sql);
	$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

	@row =$sth->fetchrow_array;

	my $CommitLogID = $row[0];

	$sql = "
UPDATE ports
   SET last_commit_id = $CommitLogID
 WHERE id             = $porttorefresh
";

	print "sql = $sql\n";

	$sth = $dbh->prepare($sql);
	$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

}

$sth->finish();

$dbh->commit();
$dbh->disconnect();

After running that script, I ran one of the queries from above to confirm all was well:

SELECT count(*)
  FROM ports P
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
     0
(1 row)

That’s all the ports correct.

Let’s try running the problem query mentioned in a previous post:

SELECT count(*)
  FROM ports P LEFT OUTER JOIN ports_vulnerable PV on PV.port_id = P.id JOIN commit_log CL on P.last_commit_id = CL.id, categories C, element E
 WHERE P.category_id   = C.id
   AND P.element_id    = E.id  
   AND lower(E.name) = lower(E'bird');
 count 
-------
     1

That query correctly retrieves one row.

add a foreign key, on delete set null

The ports table is simple, with a lot of referential integrity (RI). I’m about to add more RI to it.

freshports.org=> \d  ports
                                       Table "public.ports"
      Column       |           Type           |                     Modifiers                      
-------------------+--------------------------+----------------------------------------------------
 id                | integer                  | not null default nextval('ports_id_seq'::regclass)
 element_id        | integer                  | not null
 category_id       | integer                  | not null
 short_description | text                     | 
 long_description  | text                     | 
 version           | text                     | 
 revision          | text                     | 
 maintainer        | text                     | 
 homepage          | text                     | 
 master_sites      | text                     | 
 extract_suffix    | text                     | 
 package_exists    | boolean                  | 
 depends_build     | text                     | 
 depends_run       | text                     | 
 last_commit_id    | integer                  | 
 found_in_index    | boolean                  | 
 forbidden         | text                     | 
 broken            | text                     | 
 date_added        | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone
 categories        | text                     | 
 deprecated        | text                     | 
 ignore            | text                     | 
 master_port       | text                     | 
 latest_link       | text                     | 
 depends_lib       | text                     | 
 no_latest_link    | text                     | 
 no_package        | text                     | 
 package_name      | text                     | 
 portepoch         | text                     | 
 no_cdrom          | text                     | 
 restricted        | text                     | 
 expiration_date   | date                     | 
 is_interactive    | text                     | 
 only_for_archs    | text                     | 
 not_for_archs     | text                     | 
 status            | character(1)             | not null
 showconfig        | text                     | 
 license           | text                     | 
Indexes:
    "ports_pkey" PRIMARY KEY, btree (id)
    "ports_active_idx" btree (status) WHERE status = 'A'::bpchar
    "ports_broken" btree (broken) WHERE broken <> ''::text
    "ports_category_id_idx" btree (category_id)
    "ports_deleted" btree (status) WHERE status = 'D'::bpchar
    "ports_element_id" btree (element_id)
    "ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL
    "ports_ignore" btree (ignore) WHERE ignore <> ''::text
    "ports_is_interactive" btree (is_interactive) WHERE is_interactive IS NOT NULL
    "ports_package_name" btree (package_name)
    "ports_ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL
Foreign-key constraints:
    "$1" FOREIGN KEY (category_id) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE CASCADE ON DELETE CASCADE
Referenced by:
    TABLE "ports_categories" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "commit_log_ports_ignore" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_moved" CONSTRAINT "$1" FOREIGN KEY (from_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_updating_ports_xref" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_vulnerable" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE RESTRICT ON DELETE CASCADE
    TABLE "commit_log_ports" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_moved" CONSTRAINT "$2" FOREIGN KEY (to_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "commit_log_ports_vuxml" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "commit_log_port_elements" CONSTRAINT "$3" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "cache_clearing_ports" CONSTRAINT "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
    ports_clear_cache AFTER UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_clear_cache()
    ports_ports_categories AFTER INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_categories_set()
    ports_status BEFORE INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_status()

As you can see from the above, last_commit_id is not involved in any foreign key constraints. The following statement will ensure that the field always contains a valid key, or null:

ALTER TABLE ports                                                     
    ADD FOREIGN KEY (last_commit_id)
       REFERENCES commit_log (id) ON UPDATE CASCADE ON DELETE SET NULL;

Doing another \d ports will display the following addition to the table:

   "ports_last_commit_id_fkey" FOREIGN KEY (last_commit_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE SET NULL

trigger to find the right value when set null

Here is the trigger I just created. It is based upon the script (mentioned above) which I used to fix up the values.

CREATE OR REPLACE FUNCTION check_last_commit_id() RETURNS TRIGGER as '
   declare
      l_max_commit_date  timestamp with time zone;
      l_commit_log_id    integer;
begin

  if new.last_commit_id is null then
    SELECT max(CL.commit_date)
      INTO l_max_commit_date
      FROM commit_log CL, commit_log_ports CLP
     WHERE CLP.port_id = NEW.id
       AND CL.id       = CLP.commit_log_id;

    IF FOUND THEN
      SELECT CL.id
        INTO l_commit_log_id
        FROM commit_log CL, commit_log_ports CLP
       WHERE CLP.port_id    = NEW.id
         AND CL.id          = CLP.commit_log_id
         AND CL.commit_date = l_max_commit_date;

      IF FOUND THEN
        NEW.last_commit_id := l_commit_log_id;
      END IF;
    END IF;

  end if;

  RETURN NEW;

end;
' LANGUAGE 'plpgsql';


  DROP TRIGGER check_last_commit_id on ports;
CREATE TRIGGER check_last_commit_id
BEFORE update on ports
FOR EACH ROW
EXECUTE PROCEDURE check_last_commit_id();

Here is my test of that code:

freshports.org=# begin;
BEGIN
freshports.org=# select id, last_commit_id from ports where id = 234;
 id  | last_commit_id 
-----+----------------
 234 |         237699
(1 row)

freshports.org=# update ports set last_commit_id = null where id = 234;
UPDATE 1
freshports.org=# select id, last_commit_id from ports where id = 234;
 id  | last_commit_id 
-----+----------------
 234 |         237699
(1 row)

freshports.org=# ROLLBACK;
ROLLBACK
freshports.org=# 

Yes, I just set the last_commit_id to null, and it bounced right back. But for the killer test, let’s delete a commit and see what happens.

Testing via commit delete

Let’s start with this commit against chinese/bg5ps. I will check the existing values, delete the commit, then check the new values.

freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps';
  id  | last_commit_id 
------+----------------
 2973 |         504539
(1 row)

freshports.org=# begin;
BEGIN
freshports.org=# delete from commit_log where id = 504539 and message_id = '201401251811.s0PIBg6R031537@svn.freebsd.org';
DELETE 1
freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps';
  id  | last_commit_id 
------+----------------
 2973 |         504087
(1 row)

freshports.org=# rollback;
ROLLBACK
freshports.org=# 

Yes, that seems to have worked as planned. I did a rollback at the end, just because the delete was not required.

Verifying the fix

But let’s check that new value. Is it correct?

freshports.org=# select message_id from commit_log where id = 504087;
                 message_id                  
---------------------------------------------
 201401221552.s0MFqBK6007511@svn.freebsd.org
(1 row)

freshports.org=# 

Looking at the webpage, yes, that is the expected value for that message_id, which can be found by hovering over the commit email icon or the file set icon.

I’ll run this on my dev server for a while before pushing it through to production.