Earlier today, I wrote about a fix which broke search. At the end of that post, I mentioned a few things which needed to be done to fix up the broken relationships.
- clean up the database
- add a foreign key, on delete set null
- trigger to find the right value when set null
Let’s get started.
clean up the database
How many rows are we talking about?
SELECT count(*) FROM ports P WHERE NOT EXISTS (SELECT CL.id FROM commit_log CL WHERE CL.id = P.last_commit_id); count ------- 13023
Ouch. That’s about half the ports tree.
Let’s set them all NULL.
BEGIN; UPDATE ports P set last_commit_id = NULL WHERE NOT EXISTS (SELECT CL.id FROM commit_log CL WHERE CL.id = P.last_commit_id); UPDATE 13023 -- that number matches up -- now let's run that first query again: SELECT count(*) FROM ports P WHERE NOT EXISTS (SELECT CL.id FROM commit_log CL WHERE CL.id = P.last_commit_id); count ------- 13023 (1 row) -- oh, of course. Let's exclude what we just fixed SELECT count(*) FROM ports P WHERE P.last_commit_id IS NOT NULL AND NOT EXISTS (SELECT CL.id FROM commit_log CL WHERE CL.id = P.last_commit_id); count ------- 0 (1 row) -- this looks good; commit; commit; COMMIT
Now, let’s fix up all the NULL keys. Fortunately, I have a script which fixes this exact situation.
#!/usr/bin/perl -w # # $Id: set-last-commit-id.pl,v 1.2 2006-12-17 12:04:06 dan Exp $ # # Copyright (c) 1999-2004 DVL Software # # # I found that several (about 800-900) ports did not yet have last_commit_id set. # this script sets that. # use strict; use lib "$ENV{HOME}/scripts"; use port; use DBI; use database; use utilities; my $dbh; my $porttorefresh; my @PORTS; my $sql; my $sth; my @row; FreshPorts::Utilities::InitSyslog(); $dbh = FreshPorts::Database::GetDBHandle(); # # get a list of ports to update # $sql = " SELECT id FROM ports WHERE last_commit_id IS NULL "; print "sql = $sql\n"; $sth = $dbh->prepare($sql); $sth->execute || FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1); while (@row=$sth->fetchrow_array) { print "now reading @row\n"; push @PORTS, "$row[0]" } my $port = FreshPorts::Port->new($dbh); foreach $porttorefresh (@PORTS) { $sql = " SELECT max(CL.commit_date) FROM commit_log CL, commit_log_ports CLP WHERE CLP.port_id = $porttorefresh AND CL.id = CLP.commit_log_id "; print "sql = $sql\n"; $sth = $dbh->prepare($sql); $sth->execute || FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1); @row =$sth->fetchrow_array; my $CommitLogDate = $row[0]; $sql = " SELECT CL.id FROM commit_log CL, commit_log_ports CLP WHERE CLP.port_id = $porttorefresh AND CL.id = CLP.commit_log_id AND CL.commit_date = '$CommitLogDate' "; $sth = $dbh->prepare($sql); $sth->execute || FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1); @row =$sth->fetchrow_array; my $CommitLogID = $row[0]; $sql = " UPDATE ports SET last_commit_id = $CommitLogID WHERE id = $porttorefresh "; print "sql = $sql\n"; $sth = $dbh->prepare($sql); $sth->execute || FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1); } $sth->finish(); $dbh->commit(); $dbh->disconnect();
After running that script, I ran one of the queries from above to confirm all was well:
SELECT count(*) FROM ports P WHERE NOT EXISTS (SELECT CL.id FROM commit_log CL WHERE CL.id = P.last_commit_id); count ------- 0 (1 row)
That’s all the ports correct.
Let’s try running the problem query mentioned in a previous post:
SELECT count(*) FROM ports P LEFT OUTER JOIN ports_vulnerable PV on PV.port_id = P.id JOIN commit_log CL on P.last_commit_id = CL.id, categories C, element E WHERE P.category_id = C.id AND P.element_id = E.id AND lower(E.name) = lower(E'bird'); count ------- 1
That query correctly retrieves one row.
add a foreign key, on delete set null
The ports table is simple, with a lot of referential integrity (RI). I’m about to add more RI to it.
freshports.org=> \d ports Table "public.ports" Column | Type | Modifiers -------------------+--------------------------+---------------------------------------------------- id | integer | not null default nextval('ports_id_seq'::regclass) element_id | integer | not null category_id | integer | not null short_description | text | long_description | text | version | text | revision | text | maintainer | text | homepage | text | master_sites | text | extract_suffix | text | package_exists | boolean | depends_build | text | depends_run | text | last_commit_id | integer | found_in_index | boolean | forbidden | text | broken | text | date_added | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone categories | text | deprecated | text | ignore | text | master_port | text | latest_link | text | depends_lib | text | no_latest_link | text | no_package | text | package_name | text | portepoch | text | no_cdrom | text | restricted | text | expiration_date | date | is_interactive | text | only_for_archs | text | not_for_archs | text | status | character(1) | not null showconfig | text | license | text | Indexes: "ports_pkey" PRIMARY KEY, btree (id) "ports_active_idx" btree (status) WHERE status = 'A'::bpchar "ports_broken" btree (broken) WHERE broken <> ''::text "ports_category_id_idx" btree (category_id) "ports_deleted" btree (status) WHERE status = 'D'::bpchar "ports_element_id" btree (element_id) "ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL "ports_ignore" btree (ignore) WHERE ignore <> ''::text "ports_is_interactive" btree (is_interactive) WHERE is_interactive IS NOT NULL "ports_package_name" btree (package_name) "ports_ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL Foreign-key constraints: "$1" FOREIGN KEY (category_id) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE "$2" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE CASCADE ON DELETE CASCADE Referenced by: TABLE "ports_categories" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "commit_log_ports_ignore" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "ports_moved" CONSTRAINT "$1" FOREIGN KEY (from_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "ports_updating_ports_xref" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "ports_vulnerable" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE RESTRICT ON DELETE CASCADE TABLE "commit_log_ports" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "ports_moved" CONSTRAINT "$2" FOREIGN KEY (to_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "commit_log_ports_vuxml" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "commit_log_port_elements" CONSTRAINT "$3" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "cache_clearing_ports" CONSTRAINT "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE Triggers: ports_clear_cache AFTER UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_clear_cache() ports_ports_categories AFTER INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_categories_set() ports_status BEFORE INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_status()
As you can see from the above, last_commit_id is not involved in any foreign key constraints. The following statement will ensure that the field always contains a valid key, or null:
ALTER TABLE ports ADD FOREIGN KEY (last_commit_id) REFERENCES commit_log (id) ON UPDATE CASCADE ON DELETE SET NULL;
Doing another \d ports will display the following addition to the table:
"ports_last_commit_id_fkey" FOREIGN KEY (last_commit_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE SET NULL
trigger to find the right value when set null
Here is the trigger I just created. It is based upon the script (mentioned above) which I used to fix up the values.
CREATE OR REPLACE FUNCTION check_last_commit_id() RETURNS TRIGGER as ' declare l_max_commit_date timestamp with time zone; l_commit_log_id integer; begin if new.last_commit_id is null then SELECT max(CL.commit_date) INTO l_max_commit_date FROM commit_log CL, commit_log_ports CLP WHERE CLP.port_id = NEW.id AND CL.id = CLP.commit_log_id; IF FOUND THEN SELECT CL.id INTO l_commit_log_id FROM commit_log CL, commit_log_ports CLP WHERE CLP.port_id = NEW.id AND CL.id = CLP.commit_log_id AND CL.commit_date = l_max_commit_date; IF FOUND THEN NEW.last_commit_id := l_commit_log_id; END IF; END IF; end if; RETURN NEW; end; ' LANGUAGE 'plpgsql'; DROP TRIGGER check_last_commit_id on ports; CREATE TRIGGER check_last_commit_id BEFORE update on ports FOR EACH ROW EXECUTE PROCEDURE check_last_commit_id();
Here is my test of that code:
freshports.org=# begin; BEGIN freshports.org=# select id, last_commit_id from ports where id = 234; id | last_commit_id -----+---------------- 234 | 237699 (1 row) freshports.org=# update ports set last_commit_id = null where id = 234; UPDATE 1 freshports.org=# select id, last_commit_id from ports where id = 234; id | last_commit_id -----+---------------- 234 | 237699 (1 row) freshports.org=# ROLLBACK; ROLLBACK freshports.org=#
Yes, I just set the last_commit_id to null, and it bounced right back. But for the killer test, let’s delete a commit and see what happens.
Testing via commit delete
Let’s start with this commit against chinese/bg5ps. I will check the existing values, delete the commit, then check the new values.
freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps'; id | last_commit_id ------+---------------- 2973 | 504539 (1 row) freshports.org=# begin; BEGIN freshports.org=# delete from commit_log where id = 504539 and message_id = '201401251811.s0PIBg6R031537@svn.freebsd.org'; DELETE 1 freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps'; id | last_commit_id ------+---------------- 2973 | 504087 (1 row) freshports.org=# rollback; ROLLBACK freshports.org=#
Yes, that seems to have worked as planned. I did a rollback at the end, just because the delete was not required.
Verifying the fix
But let’s check that new value. Is it correct?
freshports.org=# select message_id from commit_log where id = 504087; message_id --------------------------------------------- 201401221552.s0MFqBK6007511@svn.freebsd.org (1 row) freshports.org=#
Looking at the webpage, yes, that is the expected value for that message_id, which can be found by hovering over the commit email icon or the file set icon.
I’ll run this on my dev server for a while before pushing it through to production.
This has been moved to production.