A long-standing bug, perhaps dating back to April 2003, has been fixed. It was an optimization in a trigger: if the values in the trigger have not changed, let’s not run the trigger.
Turns out, that was a bad idea.
The ports_categories table documents the connections between a port and the categories in which it resides. There must be one physical category. By physical, I mean a category for which a directory exists on disk in the /usr/ports directory. This is sometimes referred to as the primary category. The primary/physical category must always be listed first in the CATEGORIES= directive of the port’s Makefile. See also Categorization in the FreeBSD Porters Handbook.
How I know
A system check is run on on regular basis and I get an email looking like this:
This is a list of ports that do not have entries in the ports_categories table This can be fixed with this query: begin; insert into ports_categories select id, category_id from ports_active PA WHERE NOT EXISTS (SELECT * from ports_categories PC where PC.port_id = PA.id and PC.category_id = PA.category_id); This is a list of the ports in question: id | category_id | name | category | port | element_pathname -----+-------------+------------------+--------------+-------------------------------+------------------------------------------- 500 | 9 | open-motif-devel | x11-toolkits | x11-toolkits/open-motif-devel | /ports/head/x11-toolkits/open-motif-devel (1 row)
Checking the data
I now suspect the resurrection process on FreshPorts. Perhaps the updates to ports_categories are not being done.
Let’s look at the data:
[pg02 dan ~] % psql freshports.devgit psql (12.11) Type "help" for help. freshports.devgit=# select category_id, categories from ports where id = 500; category_id | categories -------------+-------------- 9 | x11-toolkits (1 row) freshports.devgit=# select * from ports_categories where port_id = 500; port_id | category_id ---------+------------- (0 rows) freshports.devgit=#
From there, I can see category_id (derived value) and categories (directly copied from the Makefile are populated. It seems the only information missed is the entry from the ports_categories table.
Here are the files/logs involved with that commit. They are archived by commit date.
[dev-ingress01 dan ~freshports/message-queues/recent] % cd ../archive/2022_08/2022_08_16 [dev-ingress01 dan ~freshports/message-queues/archive/2022_08/2022_08_16] % ls *0395dfd3904676e2e652bfd7850c4a48ec5cebba* 2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.log 2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.xml [dev-ingress01 dan ~freshports/message-queues/archive/2022_08/2022_08_16] %
Can it be reproduced?
I have an idea. Let’s rerun the commit and see if the situation persists. If so, we have a reproducible event.
This is the delete:
freshports.devgit=# begin; delete from commit_log where message_id = '0395dfd3904676e2e652bfd7850c4a48ec5cebba'; BEGIN DELETE 1 freshports.devgit=# commit; COMMIT freshports.devgit=#
This is the injection of the commit into the processing queue:
[dev-ingress01 dan ~freshports/message-queues/archive/2022_08/2022_08_16] % sudo mv -i 2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.xml ~ingress/message-queues/incoming/
The files are now back in the recent queue and this is how we know processing is complete.
[dev-ingress01 dan ~freshports/message-queues/recent] % ls *0395dfd3904676e2e652bfd7850c4a48ec5cebba* 2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.log 2022.08.16.03.33.52.000000.0395dfd3904676e2e652bfd7850c4a48ec5cebba.xml
Checking the tables, we have the same results. The situation can be reproduced.
freshports.devgit=# select category_id, categories from ports where id = 500; category_id | categories -------------+-------------- 9 | x11-toolkits (1 row) freshports.devgit=# select * from ports_categories where port_id = 500; port_id | category_id ---------+------------- (0 rows) freshports.devgit=#
I looked at the code, and found nothing updating the ports_categories table. It must be part of the database itself (i.e. a trigger).
Looking at the source code for the relational integrity stuff, I found this function:
CREATE OR REPLACE FUNCTION ports_categories_set() RETURNS TRIGGER AS $$ DECLARE CategoryCount integer; CategoryNames text; Category text; CategoryID int8; UpdateNeeded boolean; BEGIN UpdateNeeded := TRUE; IF TG_OP = 'UPDATE' THEN IF old.categories = new.categories THEN UpdateNeeded := FALSE; END IF; END IF; IF UpdateNeeded THEN DELETE FROM ports_categories WHERE port_id = new.id; IF new.categories IS NOT NULL THEN CategoryCount := 0; CategoryNames := new.categories; LOOP ...
The highlighted lines 12-14 are an optimization. I think it’s wrong. Presumably the table would be correct if no changes are occurring. However, I see the faulty logic and it’s not in the function above.
That code dates back to at least April 2003.
When the ports_categories table was first added to the database, it needed to be populated. Looking at commit logs, this was sometime pre 2006-12-17 because the oldest log entry says:
r4012 | dan | 2006-12-17 12:06:34 +0000 (Sun, 17 Dec 2006) | 2 lines Merge from FreshPorts2 to HEAD
I could not find any earlier references. In that populate script, I find:
# # get a list of ports to update # $sql = " SELECT ports_active.id AS id, ports_active.categories AS categories, ports_active.name AS name, ports_active.category AS category FROM ports_active ORDER BY ports_active.id";
The ports_active is a view of the ports table which excludes deleted ports.
That’s the problem right there ^
Why were deleted ports excluded? It is awkward to extract values from deleted ports. The Makefile may be out of data for the current FreshPorts code. Sure, the code could versioned and could cope with older Makefile – that approach is common with APIs. No, I’m not going to do that.
Let’s comment out that optimization from the trigger and see if that fixes the situation.
Not shown: editing the trigger, updating the database, deleting the commit, and injecting the XML file back into the processing queue.
Here’s the new situation:
freshports.devgit=# select * from ports_categories where port_id = 500; port_id | category_id ---------+------------- 500 | 9 (1 row) freshports.devgit=# select category_id, categories from ports where id = 500; category_id | categories -------------+-------------- 9 | x11-toolkits (1 row)
That is correct. The ports_categories now contains the expected values. Good.
I still have to wait for the top of the hour when the system check will be invoked, but right now, I’m confident this has solved an elusive bug.
EDIT: Several hours later, the alerts are no longer occurring on the dev database. Time to push these change through to test, stage, and then prod.