Adding and populating CONFLICTS

This post serves two goals:

document the CONFLICTS implementation within FreshPorts
show how you can use triggers to update a secondary table

The FreeBSD ports tree / package collection has a method for resolving conflicts. This feature allows a port maintainer to specify what, if any, packages conflict with a given port. The reasons why two ports might conflict is outside the scope of this article. For background reading:

Obtaining the values from the Makefile

To get started, let’s look at what I already have in the development instance of the FreshPorts database.

freshports.dev=# SELECT id, packagename(id), conflicts, conflicts_build, conflicts_install FROM ports WHERE coalesce(conflicts, conflicts_build, conflicts_install) IS NOT NULL and packagename(id) like 'bacula%' order by last_commit_id desc limit 10
;
  id   |      packagename      |            conflicts             | conflicts_build | conflicts_install 
-------+-----------------------+----------------------------------+-----------------+-------------------
 45084 | bacula9-docs          | bacula-* bacula5-*               |                 | 
 45086 | bacula9-server        | bacula5-server-* bacula-server-* |                 | 
 45087 | bacula9-bat           | bacula5-server-* bacula-server-* |                 | 
 45083 | bacula9-client        | bacula5-server-* bacula-server-* |                 | 
 45085 | bacula9-client-static | bacula5-server-* bacula-server-* |                 | 
(5 rows)

freshports.dev=#

The packagename corresponds to the following FreeBSD ports:

In this case, bacula9 ports conflict with older versions of Bacula. How can we get those values from the Makefile? Let’s get the values for bacula9-server. Assume these commands are done in the /var/db/freshports/ports-jail/var/db/repos/PORTS-head/sysutils/bacula9-server directory, which is a non-standard location for ports files, but it suites the needs of the website.

$ grep CONFLICTS *
Makefile:CONFLICTS?=	bacula5-server-* bacula-server-*

NOTE: grep is not an appropriate method for obtaining the values because the values may be contingent upon other configuration settings.

This is the proper method:

$ make -V CONFLICTS -V CONFLICTS_BUILD -V CONFLICTS_INSTALL
bacula5-server-* bacula-server-*

NOTE: those two blank lines correspond to the empty values contained in those files.

globs not regex

CONFLICTS can involve wildcards, such as the * shown above. It is critical to realize that these are globs, meant for matching disk filenames, as opposed to regular expressions.

From the documentation

# CONFLICTS             - A list of package name patterns that the port conflicts
#                                 with, separated by blanks.  The names may include shell
#                                 pattern meta-characters "*", "?", "[", "]", and "!".
#                                 Example: apache*-1.2* apache*-1.3.[012345] apache-*+ssl_*

PostgreSQL has many regex functions. Let’s see about converting globs to regex.

glob to regex

Perl to the rescue. I received some help, as many people do, from RhodiumToad, who supplied this function:

CREATE TRUSTED PROCEDURAL LANGUAGE plperl
    HANDLER plperl_call_handler
    INLINE plperl_inline_handler
    VALIDATOR plperl_validator;
    
COMMENT ON LANGUAGE plperl
    IS 'PL/Perl procedural language';

CREATE FUNCTION public.glob_to_regex(text) RETURNS text
    LANGUAGE plperl
    AS $_$
    my $w = quotemeta shift;
    $w =~ s{\\([*?]|\\(\\?.)|\[(?:\\\!)?(?:\\\])?[^]]*\\\])}
                   {($1 eq '*' ? '.*' :
                         $1 eq '?' ? '.'  :
                         defined($2) ? $2 :
                         do { my $p = $1; $p =~ s/\\([][^-])/$1/g; $p =~ s/^\[\\!/[^/; $p; })}eg;
    return $w;
$_$;

On FreeBSD, I needed to add the databases/postgresql10-server package to provide support for Perl language extensions.

This will take a glob and translate it into a regular expression.

Here are a few examples:

freshports.dev=# select glob_to_regex('bacula-*');
 glob_to_regex 
---------------
 bacula\-.*
(1 row)

freshports.dev=# 


freshports.dev=# select glob_to_regex('postgresql9[^4]*');
   glob_to_regex   
-------------------
 postgresql9[^4].*
(1 row)

freshports.dev=# select glob_to_regex('clang-devel-3.[1234567]*');
        glob_to_regex         
------------------------------
 clang\-devel\-3\.[1234567].*
(1 row)

freshports.dev=#

The secondary table

As part of earlier work, the ports table was upgraded to add three new fields:

ALTER TABLE ports ADD COLUMN conflicts text;
ALTER TABLE ports ADD COLUMN conflicts_build text;
ALTER TABLE ports ADD COLUMN conflicts_install text;

These columns will store the raw values as obtained directly from the Makefile. The corresponding regex values will be store, one per row, in another table. That table looks like this:

CREATE TABLE ports_conflicts
(
    id BIGSERIAL,
    port_id integer          NOT NULL,
    conflicts_type conflicts NOT NULL,
    item_num integer         NOT NULL,
    item_value text          NOT NULL,
    CONSTRAINT ports_conflicts_pkey PRIMARY KEY (id),
    CONSTRAINT ports_conflicts_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

What’s that column type on line 5 you might ask.

CREATE TYPE public.conflicts AS ENUM
    ('conflicts', 'conflicts_build', 'conflicts_install');

The item_num column will help to order the values so they correspond to what we find in the the source value. Feedback regarding the accuracy of that claim is most welcome. The ordering is not vital, but it does help to make sure it is consistent when we pull back the values later.

The Trigger

This is the trigger which will populate the ports_conflicts table.

CREATE OR REPLACE FUNCTION ports_conflicts_set() RETURNS TRIGGER AS $$
  BEGIN
    DELETE FROM ports_conflicts WHERE port_id = NEW.id;

    INSERT INTO ports_conflicts(port_id, conflicts_type, item_num, item_value)
      SELECT NEW.id                         AS port_id, 
             'conflicts'::conflicts         AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_build'::conflicts   AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_build, E'\\s+') AS R(item_value)

      UNION

      SELECT NEW.id                         AS port_id, 
             'conflicts_install'::conflicts AS conflict_type,
             row_number() OVER (ORDER BY item_value),
             glob_to_regex(R.item_value)
        FROM regexp_split_to_table(NEW.conflicts_install, E'\\s+') AS R(item_value);

    RETURN NEW;
  END
$$ LANGUAGE 'plpgsql';


  DROP TRIGGER IF EXISTS ports_conflicts_insert ON ports;
CREATE TRIGGER ports_conflicts_insert
    AFTER INSERT on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

  DROP TRIGGER IF EXISTS ports_conflicts_update ON ports;
CREATE TRIGGER ports_conflicts_update
    AFTER UPDATE on ports
    FOR EACH ROW
    EXECUTE PROCEDURE ports_conflicts_set();

Line 3 : deletes existing entries for this port

Lines 6-10 : compiles a list of conflicts values

Lines 14-18 : does the same thing for conflicts_build values

Lines 22-26 : same again, for conflicts_install

The UNIONs tie them all together and line 5 puts it into the ports_conflicts table.

Details on the trigger

This section explains the trigger in a bit more detail.

The regexp_split_to_table function is a wonderful solution to break each CONFLICTS value into separate rows, like this:

freshports.dev=# select regexp_split_to_table('bacula5-server-* bacula-server-*',  E'\\s+');
 regexp_split_to_table 
-----------------------
 bacula5-server-*
 bacula-server-*
(2 rows)

freshports.dev=#

The glob_to_regex(R.item_value) call changes each of the above items from a blog to a regex.

row_number() OVER (ORDER BY item_value) will sort individual value within a conflict and give us a value for the ports_conflicts.item_num field.

Matching against the regex

The list of conflicts for a given port will not change until the next commit for a port.

However, what that conflict matches against might change when another port is updated.

Thus, the ports_conflicts table only updates based upon a trigger on the ports table.

We will store the actual matches in the ports_conflicts_matches table.

CREATE TABLE ports_conflicts_matches
(
    ports_conflicts_id bigint NOT NULL,
    port_id integer NOT NULL,
    CONSTRAINT ports_conflicts_matches_pkey PRIMARY KEY (ports_conflicts_id, port_id),
    CONSTRAINT ports_conflicts_matches_conflicts_id FOREIGN KEY (ports_conflicts_id)
        REFERENCES public.ports_conflicts (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT ports_conflicts_matches_port_id FOREIGN KEY (port_id)
        REFERENCES public.ports (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

How do we populate that table? How can it be quickly done?

That is the part which took the longest time to figure out.

Simple population

Here are some sample CONFLICTS values from the early testing (as recorded in an older gist):

freshports.dev=# select * from ports_conflicts;
 id | port_id |  conflicts_type   | item_num |          item_value          
----+---------+-------------------+----------+------------------------------
 21 |   43574 | conflicts_install |        4 | mysqlwsrep.*
 22 |   43574 | conflicts_install |        3 | mysql[0-9].*\-server\-.*
 23 |   43574 | conflicts_install |        1 | mariadb10[013-9]\-server\-.*
 24 |   43574 | conflicts_install |        5 | percona[0-9].*\-server\-.*
 25 |   43574 | conflicts_install |        2 | mariadb5.*\-server\-.*
(5 rows)

freshports.dev=#

Given the regex in ports_conflicts.item_value, let’s find the matching values from the commit_log_ports table. I can concatenate all the regex values into one search term. This approach works rather well:

freshports.dev=# explain analyse
freshports.dev-# SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
                                                                                     QUERY PLAN                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=19482.04..19482.68 rows=85 width=22) (actual time=497.597..497.856 rows=364 loops=1)
   ->  Sort  (cost=19482.04..19482.25 rows=85 width=22) (actual time=497.596..497.658 rows=852 loops=1)
         Sort Key: port_id, port_name_revision
         Sort Method: quicksort  Memory: 91kB
         ->  Gather  (cost=1000.00..19479.32 rows=85 width=22) (actual time=107.422..497.400 rows=852 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on commit_log_ports clp  (cost=0.00..18470.82 rows=35 width=22) (actual time=108.076..493.249 rows=284 loops=3)
                     Filter: (port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)'::text)
                     Rows Removed by Filter: 307469
 Planning time: 0.417 ms
 Execution time: 498.383 ms
(12 rows)

freshports.dev=#

The results are something like this:

SELECT distinct CLP.port_id, CLP.port_name_revision 
  FROM commit_log_ports CLP 
 WHERE CLP.port_name_revision ~ '^(mysqlwsrep.*|mysql[0-9].*\-server\-.*|mariadb10[013-9]\-server\-.*|percona[0-9].*\-server\-.*|mariadb5.*\-server\-.*)';
 
  port_id |      port_name_revision       
---------+-------------------------------
   16965 | mysql51-server-5.1.11
   16965 | mysql51-server-5.1.12
   16965 | mysql51-server-5.1.14
   16965 | mysql51-server-5.1.15
   16965 | mysql51-server-5.1.16
   16965 | mysql51-server-5.1.17
   16965 | mysql51-server-5.1.18
   16965 | mysql51-server-5.1.19
   16965 | mysql51-server-5.1.20
   16965 | mysql51-server-5.1.21
   16965 | mysql51-server-5.1.22
...

Only a few rows of the results are shown above. The full list of matches is in that gist.

Yes, this works, however, that solution takes nearly half a second and caters for just set of one match. There must be a better way. If you keep reading in that gist, you will see the other methods I tried. I want to thank those on IRC who helped me find different ways to do this.

Next, I tried an SRF (Set Returning Function) and documented that in this gist. This one is fast; the test I ran taking 7.137 ms, but it did not scale, and exploded to 4825.074 ms when I tried multiple rows.

SRF within a function

My next attempt involved a plpgsql function which took only 13.180 ms, a huge speed increase. This turned out to be the solution I implemented.

Let’s review slightly using the final gist. My original function was:

CREATE OR REPLACE FUNCTION port_conflicts4(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  SELECT DISTINCT PC.id, CLP.port_id
    FROM commit_log_ports CLP JOIN ports_conflicts PC 
                              ON (PC.id = $1 AND CLP.port_name_revision ~ $2);
$$ LANGUAGE SQL STRICT;

Then I created a SRF using plpgsql, which is incredibly fast:

CREATE OR REPLACE FUNCTION port_conflicts(bigint, text) RETURNS SETOF ports_conflicts_matches AS $$
  DECLARE
    a_ports_conflicts_id ALIAS FOR $1;
    a_regex              ALIAS FOR $2;

    l_regex text;
    r       ports_conflicts_matches%rowtype;
  BEGIN
    l_regex := '^(' || a_regex || ')';
    
    FOR r in SELECT distinct a_ports_conflicts_id, port_id FROM commit_log_ports WHERE port_name_revision ~ l_regex
    LOOP
      RETURN NEXT r;
    END LOOP;
  END;

$$ LANGUAGE 'plpgsql' STABLE;

Now let’s try that function out:

freshports.dev=# explain analyse SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts5(PC.id, '^' || PC.item_value || '$')) PC4;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.26..35052.76 rows=1751000 width=12) (actual time=0.594..2208.947 rows=10700 loops=1)
   ->  Seq Scan on ports_conflicts pc  (cost=0.00..32.51 rows=1751 width=26) (actual time=0.015..0.838 rows=1751 loops=1)
   ->  Function Scan on port_conflicts5  (cost=0.26..10.26 rows=1000 width=12) (actual time=1.258..1.259 rows=6 loops=1751)
 Planning time: 0.112 ms
 Execution time: 2209.598 ms
(5 rows)

freshports.dev=#

2.2 seconds, to calculate conflicts for 1751 rows, resulting in 10700 rows. That’s fast enough. We could do that on a periodic basis if required.

From my personal development ethics, I don’t like the idea of having a function and a table with the same / very similar name (port_conflicts function vs ports_conflicts table). I might rename the function to find_matches_for_conflict. I think that better describes the purpose of the function.

Populate the destination table

freshports.dev=# insert into ports_conflicts_matches SELECT PC4.*                                                                                
  FROM ports_conflicts PC, LATERAL (SELECT * FROM port_conflicts(PC.id, '^' || PC.item_value || '$')) PC4;
INSERT 0 10700

Now let’s have a look at some of the results:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
freshports.dev-#   FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
freshports.dev-#  WHERE PC.port_id = 45086;
  packagename   |    ?column?    |     packagename     
----------------+----------------+---------------------
 bacula9-server | conflicts with | bacula5-server
 bacula9-server | conflicts with | bacula-server
 bacula9-server | conflicts with | bacula-server-devel
(3 rows)

freshports.dev=#

You can find that magical 45086 value near the top of this post.

Let’s try openssl:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 954;
 packagename |    ?column?    |  packagename   
-------------+----------------+----------------
 openssl     | conflicts with | openssl111
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | openssl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl-devel
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
 openssl     | conflicts with | libressl
(14 rows)

Or heimdal:

freshports.dev=# SELECT PackageName(PC.port_id), 'conflicts with', PackageName(PCM.port_id)
  FROM ports_conflicts_matches PCM JOIN ports_conflicts PC ON PCM.ports_conflicts_id = PC.id
 WHERE PC.port_id = 934;
 packagename |    ?column?    | packagename  
-------------+----------------+--------------
 heimdal     | conflicts with | wu-ftpd
 heimdal     | conflicts with | wu-ftpd+ipv6
 heimdal     | conflicts with | srp
 heimdal     | conflicts with | krb5-maint
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-114
 heimdal     | conflicts with | krb5-111
 heimdal     | conflicts with | krb5-113
 heimdal     | conflicts with | krb5
 heimdal     | conflicts with | krb5-112
 heimdal     | conflicts with | krb5-116
 heimdal     | conflicts with | krb5-115
 heimdal     | conflicts with | krb4
(14 rows)

freshports.dev=#

Now partially in production

The work is partially implemented in production. The idea came from this issue suggested by zi0r.

If you have a look at the Bacula port, you will find the following under Conflicts:

CONFLICTS:

bacula5-server-*
bacula-server-*

Future work

The next job is to automatically populate the ports_conflicts_matches table. I think that needs to be done on a periodic basis, perhaps every 10 minutes if there has been a recent commit.

Just thinking about that now:

If port A conflicts with port B, then clearly port B conflicts with port A. I am not sure if there is a requirement to specify CONFLICTS in both ports, but that is something easily confirmed by the ports_conlicts table.
I tried to think of commits which would not change ports_conflicts_matches. Since the matches can include PORTVERSION, any given commit which bumps that value can affect matches.
Perhaps I could exclude commits to UPDATING, MOVED, or Mk, but they are relatively few compared to the rest.
I think it’s just safer to recompute all matches periodically. There does not seem to be a safe and straight forward way to do it based upon a smaller unit of work.