Earlier today, I wrote about a fix which broke search. At the end of that post, I mentioned a few things which needed to be done to fix up the broken relationships.
- clean up the database
- add a foreign key, on delete set null
- trigger to find the right value when set null
Let’s get started.
clean up the database
How many rows are we talking about?
SELECT count(*)
FROM ports P
WHERE NOT EXISTS
(SELECT CL.id
FROM commit_log CL
WHERE CL.id = P.last_commit_id);
count
-------
13023
Ouch. That’s about half the ports tree.
Let’s set them all NULL.
BEGIN;
UPDATE ports P
set last_commit_id = NULL
WHERE NOT EXISTS
(SELECT CL.id
FROM commit_log CL
WHERE CL.id = P.last_commit_id);
UPDATE 13023
-- that number matches up
-- now let's run that first query again:
SELECT count(*)
FROM ports P
WHERE NOT EXISTS
(SELECT CL.id
FROM commit_log CL
WHERE CL.id = P.last_commit_id);
count
-------
13023
(1 row)
-- oh, of course. Let's exclude what we just fixed
SELECT count(*)
FROM ports P
WHERE P.last_commit_id IS NOT NULL
AND NOT EXISTS
(SELECT CL.id
FROM commit_log CL
WHERE CL.id = P.last_commit_id);
count
-------
0
(1 row)
-- this looks good; commit;
commit;
COMMIT
Now, let’s fix up all the NULL keys. Fortunately, I have a script which fixes this exact situation.
#!/usr/bin/perl -w
#
# $Id: set-last-commit-id.pl,v 1.2 2006-12-17 12:04:06 dan Exp $
#
# Copyright (c) 1999-2004 DVL Software
#
#
# I found that several (about 800-900) ports did not yet have last_commit_id set.
# this script sets that.
#
use strict;
use lib "$ENV{HOME}/scripts";
use port;
use DBI;
use database;
use utilities;
my $dbh;
my $porttorefresh;
my @PORTS;
my $sql;
my $sth;
my @row;
FreshPorts::Utilities::InitSyslog();
$dbh = FreshPorts::Database::GetDBHandle();
#
# get a list of ports to update
#
$sql = "
SELECT id
FROM ports
WHERE last_commit_id IS NULL
";
print "sql = $sql\n";
$sth = $dbh->prepare($sql);
$sth->execute ||
FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);
while (@row=$sth->fetchrow_array) {
print "now reading @row\n";
push @PORTS, "$row[0]"
}
my $port = FreshPorts::Port->new($dbh);
foreach $porttorefresh (@PORTS) {
$sql = "
SELECT max(CL.commit_date)
FROM commit_log CL, commit_log_ports CLP
WHERE CLP.port_id = $porttorefresh
AND CL.id = CLP.commit_log_id
";
print "sql = $sql\n";
$sth = $dbh->prepare($sql);
$sth->execute ||
FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);
@row =$sth->fetchrow_array;
my $CommitLogDate = $row[0];
$sql = "
SELECT CL.id
FROM commit_log CL, commit_log_ports CLP
WHERE CLP.port_id = $porttorefresh
AND CL.id = CLP.commit_log_id
AND CL.commit_date = '$CommitLogDate'
";
$sth = $dbh->prepare($sql);
$sth->execute ||
FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);
@row =$sth->fetchrow_array;
my $CommitLogID = $row[0];
$sql = "
UPDATE ports
SET last_commit_id = $CommitLogID
WHERE id = $porttorefresh
";
print "sql = $sql\n";
$sth = $dbh->prepare($sql);
$sth->execute ||
FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);
}
$sth->finish();
$dbh->commit();
$dbh->disconnect();
After running that script, I ran one of the queries from above to confirm all was well:
SELECT count(*)
FROM ports P
WHERE NOT EXISTS
(SELECT CL.id
FROM commit_log CL
WHERE CL.id = P.last_commit_id);
count
-------
0
(1 row)
That’s all the ports correct.
Let’s try running the problem query mentioned in a previous post:
SELECT count(*)
FROM ports P LEFT OUTER JOIN ports_vulnerable PV on PV.port_id = P.id JOIN commit_log CL on P.last_commit_id = CL.id, categories C, element E
WHERE P.category_id = C.id
AND P.element_id = E.id
AND lower(E.name) = lower(E'bird');
count
-------
1
That query correctly retrieves one row.
add a foreign key, on delete set null
The ports table is simple, with a lot of referential integrity (RI). I’m about to add more RI to it.
freshports.org=> \d ports
Table "public.ports"
Column | Type | Modifiers
-------------------+--------------------------+----------------------------------------------------
id | integer | not null default nextval('ports_id_seq'::regclass)
element_id | integer | not null
category_id | integer | not null
short_description | text |
long_description | text |
version | text |
revision | text |
maintainer | text |
homepage | text |
master_sites | text |
extract_suffix | text |
package_exists | boolean |
depends_build | text |
depends_run | text |
last_commit_id | integer |
found_in_index | boolean |
forbidden | text |
broken | text |
date_added | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone
categories | text |
deprecated | text |
ignore | text |
master_port | text |
latest_link | text |
depends_lib | text |
no_latest_link | text |
no_package | text |
package_name | text |
portepoch | text |
no_cdrom | text |
restricted | text |
expiration_date | date |
is_interactive | text |
only_for_archs | text |
not_for_archs | text |
status | character(1) | not null
showconfig | text |
license | text |
Indexes:
"ports_pkey" PRIMARY KEY, btree (id)
"ports_active_idx" btree (status) WHERE status = 'A'::bpchar
"ports_broken" btree (broken) WHERE broken <> ''::text
"ports_category_id_idx" btree (category_id)
"ports_deleted" btree (status) WHERE status = 'D'::bpchar
"ports_element_id" btree (element_id)
"ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL
"ports_ignore" btree (ignore) WHERE ignore <> ''::text
"ports_is_interactive" btree (is_interactive) WHERE is_interactive IS NOT NULL
"ports_package_name" btree (package_name)
"ports_ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL
Foreign-key constraints:
"$1" FOREIGN KEY (category_id) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE
"$2" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE CASCADE ON DELETE CASCADE
Referenced by:
TABLE "ports_categories" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "commit_log_ports_ignore" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "ports_moved" CONSTRAINT "$1" FOREIGN KEY (from_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "ports_updating_ports_xref" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "ports_vulnerable" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE RESTRICT ON DELETE CASCADE
TABLE "commit_log_ports" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "ports_moved" CONSTRAINT "$2" FOREIGN KEY (to_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "commit_log_ports_vuxml" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "commit_log_port_elements" CONSTRAINT "$3" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "cache_clearing_ports" CONSTRAINT "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
ports_clear_cache AFTER UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_clear_cache()
ports_ports_categories AFTER INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_categories_set()
ports_status BEFORE INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_status()
As you can see from the above, last_commit_id is not involved in any foreign key constraints. The following statement will ensure that the field always contains a valid key, or null:
ALTER TABLE ports
ADD FOREIGN KEY (last_commit_id)
REFERENCES commit_log (id) ON UPDATE CASCADE ON DELETE SET NULL;
Doing another \d ports will display the following addition to the table:
"ports_last_commit_id_fkey" FOREIGN KEY (last_commit_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE SET NULL
trigger to find the right value when set null
Here is the trigger I just created. It is based upon the script (mentioned above) which I used to fix up the values.
CREATE OR REPLACE FUNCTION check_last_commit_id() RETURNS TRIGGER as '
declare
l_max_commit_date timestamp with time zone;
l_commit_log_id integer;
begin
if new.last_commit_id is null then
SELECT max(CL.commit_date)
INTO l_max_commit_date
FROM commit_log CL, commit_log_ports CLP
WHERE CLP.port_id = NEW.id
AND CL.id = CLP.commit_log_id;
IF FOUND THEN
SELECT CL.id
INTO l_commit_log_id
FROM commit_log CL, commit_log_ports CLP
WHERE CLP.port_id = NEW.id
AND CL.id = CLP.commit_log_id
AND CL.commit_date = l_max_commit_date;
IF FOUND THEN
NEW.last_commit_id := l_commit_log_id;
END IF;
END IF;
end if;
RETURN NEW;
end;
' LANGUAGE 'plpgsql';
DROP TRIGGER check_last_commit_id on ports;
CREATE TRIGGER check_last_commit_id
BEFORE update on ports
FOR EACH ROW
EXECUTE PROCEDURE check_last_commit_id();
Here is my test of that code:
freshports.org=# begin; BEGIN freshports.org=# select id, last_commit_id from ports where id = 234; id | last_commit_id -----+---------------- 234 | 237699 (1 row) freshports.org=# update ports set last_commit_id = null where id = 234; UPDATE 1 freshports.org=# select id, last_commit_id from ports where id = 234; id | last_commit_id -----+---------------- 234 | 237699 (1 row) freshports.org=# ROLLBACK; ROLLBACK freshports.org=#
Yes, I just set the last_commit_id to null, and it bounced right back. But for the killer test, let’s delete a commit and see what happens.
Testing via commit delete
Let’s start with this commit against chinese/bg5ps. I will check the existing values, delete the commit, then check the new values.
freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps'; id | last_commit_id ------+---------------- 2973 | 504539 (1 row) freshports.org=# begin; BEGIN freshports.org=# delete from commit_log where id = 504539 and message_id = '201401251811.s0PIBg6R031537@svn.freebsd.org'; DELETE 1 freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps'; id | last_commit_id ------+---------------- 2973 | 504087 (1 row) freshports.org=# rollback; ROLLBACK freshports.org=#
Yes, that seems to have worked as planned. I did a rollback at the end, just because the delete was not required.
Verifying the fix
But let’s check that new value. Is it correct?
freshports.org=# select message_id from commit_log where id = 504087;
message_id
---------------------------------------------
201401221552.s0MFqBK6007511@svn.freebsd.org
(1 row)
freshports.org=#
Looking at the webpage, yes, that is the expected value for that message_id, which can be found by hovering over the commit email icon or the file set icon.
I’ll run this on my dev server for a while before pushing it through to production.












This has been moved to production.