Aug 192022
 

A long-standing bug, perhaps dating back to April 2003, has been fixed. It was an optimization in a trigger: if the values in the trigger have not changed, let’s not run the trigger.

Turns out, that was a bad idea.

The table

The ports_categories table documents the connections between a port and the categories in which it resides. There must be one physical category. By physical, I mean a category for which a directory exists on disk in the /usr/ports directory. This is sometimes referred to as the primary category. The primary/physical category must always be listed first in the CATEGORIES= directive of the port’s Makefile. See also Categorization in the FreeBSD Porters Handbook.

There are four open issues as I write this. The missing ports_categories entries #129 issue is most relevant to what I’m trying to solve today.

How I know

A system check is run on on regular basis and I get an email looking like this:

This is a list of ports that do not have entries in the ports_categories table
This can be fixed with this query:

begin;  insert into ports_categories select id, category_id from ports_active PA WHERE NOT EXISTS 
(SELECT * from ports_categories PC where PC.port_id = PA.id and PC.category_id = PA.category_id);

This is a list of the ports in question:

 id  | category_id |       name       |   category   |             port              |             element_pathname              
-----+-------------+------------------+--------------+-------------------------------+-------------------------------------------
 500 |           9 | open-motif-devel | x11-toolkits | x11-toolkits/open-motif-devel | /ports/head/x11-toolkits/open-motif-devel
(1 row)

Checking the data

Looking at the history of commits for that port, I notice the most recent commit resurrected this port after it was deleted in 2002.

I now suspect the resurrection process on FreshPorts. Perhaps the updates to ports_categories are not being done.

Let’s look at the data:

[pg02 dan ~] % psql freshports.devgit
psql (12.11)
Type "help" for help.

freshports.devgit=# select category_id, categories from ports where id = 500;
 category_id |  categories  
-------------+--------------
           9 | x11-toolkits
(1 row)

freshports.devgit=# select * from ports_categories where port_id = 500;
 port_id | category_id 
---------+-------------
(0 rows)

freshports.devgit=# 

From there, I can see category_id (derived value) and categories (directly copied from the Makefile are populated. It seems the only information missed is the entry from the ports_categories table.

The files

Here are the files/logs involved with that commit. They are archived by commit date.

[dev-ingress01 dan ~freshports/message-queues/recent] % cd ../archive/2022_08/2022_08_16
[dev-ingress01 dan ~freshports/message-queues/archive/2022_08/2022_08_16] % ls *0395dfd3904676e2e652bfd7850c4a48ec5cebba*
2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.log  2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.xml
[dev-ingress01 dan ~freshports/message-queues/archive/2022_08/2022_08_16] %

Can it be reproduced?

I have an idea. Let’s rerun the commit and see if the situation persists. If so, we have a reproducible event.

This is the delete:

freshports.devgit=# begin; delete from commit_log where message_id = '0395dfd3904676e2e652bfd7850c4a48ec5cebba';
BEGIN
DELETE 1
freshports.devgit=# commit;
COMMIT
freshports.devgit=# 

This is the injection of the commit into the processing queue:

[dev-ingress01 dan ~freshports/message-queues/archive/2022_08/2022_08_16] % sudo mv -i 2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.xml ~ingress/message-queues/incoming/ 

The files are now back in the recent queue and this is how we know processing is complete.

[dev-ingress01 dan ~freshports/message-queues/recent] % ls *0395dfd3904676e2e652bfd7850c4a48ec5cebba*
2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.log  2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.xml

Checking the tables, we have the same results. The situation can be reproduced.

freshports.devgit=# select category_id, categories from ports where id = 500;
 category_id |  categories  
-------------+--------------
           9 | x11-toolkits
(1 row)

freshports.devgit=# select * from ports_categories where port_id = 500;
 port_id | category_id 
---------+-------------
(0 rows)

freshports.devgit=# 

I looked at the code, and found nothing updating the ports_categories table. It must be part of the database itself (i.e. a trigger).

Looking at the source code for the relational integrity stuff, I found this function:

CREATE OR REPLACE FUNCTION ports_categories_set() RETURNS TRIGGER AS $$
   DECLARE
                CategoryCount   integer;
                CategoryNames   text;
                Category        text;
                CategoryID      int8;
                UpdateNeeded    boolean;
   BEGIN
                UpdateNeeded := TRUE;

                IF TG_OP = 'UPDATE' THEN
                        IF old.categories = new.categories THEN
                                UpdateNeeded := FALSE;
                        END IF;
                END IF;

                IF UpdateNeeded THEN

                        DELETE 
                          FROM ports_categories
                         WHERE port_id = new.id;

                        IF new.categories IS NOT NULL THEN

                                CategoryCount := 0;
                                CategoryNames := new.categories;
                                LOOP
...

The highlighted lines 12-14 are an optimization. I think it’s wrong. Presumably the table would be correct if no changes are occurring. However, I see the faulty logic and it’s not in the function above.

That code dates back to at least April 2003.

Background

When the ports_categories table was first added to the database, it needed to be populated. Looking at commit logs, this was sometime pre 2006-12-17 because the oldest log entry says:

r4012 | dan | 2006-12-17 12:06:34 +0000 (Sun, 17 Dec 2006) | 2 lines

Merge from FreshPorts2 to HEAD

I could not find any earlier references. In that populate script, I find:

#
# get a list of ports to update
#

$sql = "
  SELECT ports_active.id         AS id, 
         ports_active.categories AS categories,
         ports_active.name       AS name,
         ports_active.category   AS category
    FROM ports_active
ORDER BY ports_active.id";

The ports_active is a view of the ports table which excludes deleted ports.

That’s the problem right there ^

Why were deleted ports excluded? It is awkward to extract values from deleted ports. The Makefile may be out of data for the current FreshPorts code. Sure, the code could versioned and could cope with older Makefile – that approach is common with APIs. No, I’m not going to do that.

Let’s comment out that optimization from the trigger and see if that fixes the situation.

Checking

Not shown: editing the trigger, updating the database, deleting the commit, and injecting the XML file back into the processing queue.

Here’s the new situation:

freshports.devgit=# select * from ports_categories where port_id = 500;
 port_id | category_id 
---------+-------------
     500 |           9
(1 row)

freshports.devgit=# select category_id, categories from ports where id = 500;
 category_id |  categories  
-------------+--------------
           9 | x11-toolkits
(1 row)

That is correct. The ports_categories now contains the expected values. Good.

I still have to wait for the top of the hour when the system check will be invoked, but right now, I’m confident this has solved an elusive bug.

EDIT: Several hours later, the alerts are no longer occurring on the dev database. Time to push these change through to test, stage, and then prod.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive