It’s one thing to identify a bug. It’s another to clean up the data corrupted by that bug. Fortunately, FreshPorts takes its data from the FreeBSD ports tree. This makes it rather straight forward to clean up most data issues; you just refresh the database from the ports tree.
Case in point: a recently found bug identified that the pkg-descr information was being taken from the host ports tree, not the jail ports tree. This bug was introduced during the conversion to jails. Or more precisely, it was left unchanged, resulting in a bug. The effect: all commits against pkg-descr changes were wrong. Given that the data was taken from the host ports tree, which always legged, the information collected was always outdated. The host ports tree was only up to date after a portsnap was run. Every commit involving pkg-descr would use the /usr/ports version in the host, not the one in the jail.
The list of commits
Here is a list of message_id values affected by this bug:
SELECT CL.message_id FROM commit_log_elements CLE JOIN element E ON CLE.element_id = E.id JOIN commit_log CL ON CLE.commit_log_id = CL.id WHERE E.name = 'pkg-descr' AND CL.date_added > '2013-03-22';
The WHERE clause selects all commits after the date in question and all instances of a pkg-descr file. The two JOINs connect the commit_log_elements table (list of all files affected by a given commit) to the element table (list of all files & directories in all commits; more or less the whole repository).
The list of ports
Now, let’s see the list of port names associated with those commits.
SELECT distinct P.id, C.name as category, E.name as port FROM commit_log_ports CLP JOIN ports P ON CLP.port_id = P.id JOIN commit_log CL ON CLP.commit_log_id = CL.id JOIN element E ON P.element_id = E.id JOIN categories C ON P.category_id = C.id JOIN element CE on C.element_id = CE.id WHERE CL.message_id IN (SELECT CL.message_id FROM commit_log_elements CLE JOIN element E ON CLE.element_id = E.id JOIN commit_log CL ON CLE.commit_log_id = CL.id WHERE E.name = 'pkg-descr' AND CL.date_added > '2013-03-22') ORDER BY 1;
This expands upon the previous query and pulls out a list of port IDs based upon the list of commits. It joins the ports table with the commit_log_ports table (list of each port affected by a given commit). The element table is brought in to get the port name. The other reference to the element table (CE) is spurious. We already know the category name from the categories table. I put the CE join in there without realizing the category name was duplicated (for speed).
Slave ports
But what about slave ports? Sometimes they use the master port pkg-descr values. So we need to refresh the slave ports.
select P.id, C.name || '/' || P.name as port, P.master_port from ports_active P JOIN categories C ON P.category_id = C.id JOIN element CE on C.element_id = CE.id WHERE P.id IN ( select getport(P.master_port) from ports_active P JOIN categories C ON P.category_id = C.id JOIN element CE on C.element_id = CE.id WHERE master_port in ( SELECT C.name || '/' || E.name as port FROM commit_log_ports CLP JOIN ports P ON CLP.port_id = P.id JOIN commit_log CL ON CLP.commit_log_id = CL.id JOIN element E ON P.element_id = E.id JOIN categories C ON P.category_id = C.id JOIN element CE on C.element_id = CE.id WHERE CL.message_id IN (SELECT CL.message_id FROM commit_log_elements CLE JOIN element E ON CLE.element_id = E.id JOIN commit_log CL ON CLE.commit_log_id = CL.id WHERE E.name = 'pkg-descr' AND CL.date_added > '2013-03-22') ));
This does the same thing as the previous query, but pulls back the port id of any master_port from the previous list, and uses that list of master ports to create a new list. The key to this query is the getport() function, which takes a category/port combination (e.g. sysutils/bacula-server) and returns the port id.
Yes, this becomes recursive. Now that we’ve updated the master ports, were those master ports, in turn, slave ports of something else? No. I reran the query, adding another layer, and found no more master ports.
All fixed
Plugging the above two queries into a standard script solved the database issues. Now all three website (dev, beta, and prod) are up to date.
Thanks to koobs for telling me about this problem.