Fixing up a referential integrity issue

Earlier today, I wrote about a fix which broke search. At the end of that post, I mentioned a few things which needed to be done to fix up the broken relationships.

  1. clean up the database
  2. add a foreign key, on delete set null
  3. trigger to find the right value when set null

Let’s get started.

clean up the database

How many rows are we talking about?

SELECT count(*)
  FROM ports P
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
 13023

Ouch. That’s about half the ports tree.

Let’s set them all NULL.

BEGIN;
UPDATE ports P
   set last_commit_id = NULL
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
UPDATE 13023

-- that number matches up
-- now let's run that first query again:
SELECT count(*)
  FROM ports P
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
 13023
(1 row)

-- oh, of course.  Let's exclude what we just fixed

SELECT count(*)
  FROM ports P
 WHERE P.last_commit_id IS NOT NULL
   AND NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
     0
(1 row)

-- this looks good; commit;

commit;
COMMIT

Now, let’s fix up all the NULL keys. Fortunately, I have a script which fixes this exact situation.

#!/usr/bin/perl -w
#
# $Id: set-last-commit-id.pl,v 1.2 2006-12-17 12:04:06 dan Exp $
#
# Copyright (c) 1999-2004 DVL Software
#

#
# I found that several (about 800-900) ports did not yet have last_commit_id set.
# this script sets that.
#

use strict;
use lib "$ENV{HOME}/scripts";
use port;
use DBI;
use database;
use utilities;

my $dbh;

my $porttorefresh;
my @PORTS;
my $sql;
my $sth;
my @row;

FreshPorts::Utilities::InitSyslog();

$dbh = FreshPorts::Database::GetDBHandle();

#
# get a list of ports to update
#

$sql = "
  SELECT id
    FROM ports
   WHERE last_commit_id IS NULL
";

print "sql = $sql\n";

$sth = $dbh->prepare($sql);
$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

while (@row=$sth->fetchrow_array) {
	print "now reading @row\n";
	push @PORTS, "$row[0]"
}

my $port = FreshPorts::Port->new($dbh);

foreach $porttorefresh (@PORTS) {
	$sql = "
SELECT max(CL.commit_date)
  FROM commit_log CL, commit_log_ports CLP
 WHERE CLP.port_id = $porttorefresh
   AND CL.id       = CLP.commit_log_id
";

	print "sql = $sql\n";

	$sth = $dbh->prepare($sql);
	$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

	@row =$sth->fetchrow_array;

	my $CommitLogDate = $row[0];

	$sql = "
SELECT CL.id
  FROM commit_log CL, commit_log_ports CLP
 WHERE CLP.port_id = $porttorefresh
   AND CL.id       = CLP.commit_log_id
   AND CL.commit_date = '$CommitLogDate'
";

	$sth = $dbh->prepare($sql);
	$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

	@row =$sth->fetchrow_array;

	my $CommitLogID = $row[0];

	$sql = "
UPDATE ports
   SET last_commit_id = $CommitLogID
 WHERE id             = $porttorefresh
";

	print "sql = $sql\n";

	$sth = $dbh->prepare($sql);
	$sth->execute ||
		FreshPorts::Utilities::ReportError('warning', "Could not execute SQL $sql ... maybe invalid?", 1);

}

$sth->finish();

$dbh->commit();
$dbh->disconnect();

After running that script, I ran one of the queries from above to confirm all was well:

SELECT count(*)
  FROM ports P
 WHERE NOT EXISTS
   (SELECT CL.id 
      FROM commit_log CL
     WHERE CL.id = P.last_commit_id);
 count 
-------
     0
(1 row)

That’s all the ports correct.

Let’s try running the problem query mentioned in a previous post:

SELECT count(*)
  FROM ports P LEFT OUTER JOIN ports_vulnerable PV on PV.port_id = P.id JOIN commit_log CL on P.last_commit_id = CL.id, categories C, element E
 WHERE P.category_id   = C.id
   AND P.element_id    = E.id  
   AND lower(E.name) = lower(E'bird');
 count 
-------
     1

That query correctly retrieves one row.

add a foreign key, on delete set null

The ports table is simple, with a lot of referential integrity (RI). I’m about to add more RI to it.

freshports.org=> \d  ports
                                       Table "public.ports"
      Column       |           Type           |                     Modifiers                      
-------------------+--------------------------+----------------------------------------------------
 id                | integer                  | not null default nextval('ports_id_seq'::regclass)
 element_id        | integer                  | not null
 category_id       | integer                  | not null
 short_description | text                     | 
 long_description  | text                     | 
 version           | text                     | 
 revision          | text                     | 
 maintainer        | text                     | 
 homepage          | text                     | 
 master_sites      | text                     | 
 extract_suffix    | text                     | 
 package_exists    | boolean                  | 
 depends_build     | text                     | 
 depends_run       | text                     | 
 last_commit_id    | integer                  | 
 found_in_index    | boolean                  | 
 forbidden         | text                     | 
 broken            | text                     | 
 date_added        | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone
 categories        | text                     | 
 deprecated        | text                     | 
 ignore            | text                     | 
 master_port       | text                     | 
 latest_link       | text                     | 
 depends_lib       | text                     | 
 no_latest_link    | text                     | 
 no_package        | text                     | 
 package_name      | text                     | 
 portepoch         | text                     | 
 no_cdrom          | text                     | 
 restricted        | text                     | 
 expiration_date   | date                     | 
 is_interactive    | text                     | 
 only_for_archs    | text                     | 
 not_for_archs     | text                     | 
 status            | character(1)             | not null
 showconfig        | text                     | 
 license           | text                     | 
Indexes:
    "ports_pkey" PRIMARY KEY, btree (id)
    "ports_active_idx" btree (status) WHERE status = 'A'::bpchar
    "ports_broken" btree (broken) WHERE broken <> ''::text
    "ports_category_id_idx" btree (category_id)
    "ports_deleted" btree (status) WHERE status = 'D'::bpchar
    "ports_element_id" btree (element_id)
    "ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL
    "ports_ignore" btree (ignore) WHERE ignore <> ''::text
    "ports_is_interactive" btree (is_interactive) WHERE is_interactive IS NOT NULL
    "ports_package_name" btree (package_name)
    "ports_ports_expiration_date" btree (expiration_date) WHERE expiration_date IS NOT NULL
Foreign-key constraints:
    "$1" FOREIGN KEY (category_id) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE CASCADE ON DELETE CASCADE
Referenced by:
    TABLE "ports_categories" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "commit_log_ports_ignore" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_moved" CONSTRAINT "$1" FOREIGN KEY (from_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_updating_ports_xref" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_vulnerable" CONSTRAINT "$1" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE RESTRICT ON DELETE CASCADE
    TABLE "commit_log_ports" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "ports_moved" CONSTRAINT "$2" FOREIGN KEY (to_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "commit_log_ports_vuxml" CONSTRAINT "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "commit_log_port_elements" CONSTRAINT "$3" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "cache_clearing_ports" CONSTRAINT "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_dependent_upon_fkey" FOREIGN KEY (port_id_dependent_upon) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "port_dependencies" CONSTRAINT "port_dependencies_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
    ports_clear_cache AFTER UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_clear_cache()
    ports_ports_categories AFTER INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_categories_set()
    ports_status BEFORE INSERT OR UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_status()

As you can see from the above, last_commit_id is not involved in any foreign key constraints. The following statement will ensure that the field always contains a valid key, or null:

ALTER TABLE ports                                                     
    ADD FOREIGN KEY (last_commit_id)
       REFERENCES commit_log (id) ON UPDATE CASCADE ON DELETE SET NULL;

Doing another \d ports will display the following addition to the table:

   "ports_last_commit_id_fkey" FOREIGN KEY (last_commit_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE SET NULL

trigger to find the right value when set null

Here is the trigger I just created. It is based upon the script (mentioned above) which I used to fix up the values.

CREATE OR REPLACE FUNCTION check_last_commit_id() RETURNS TRIGGER as '
   declare
      l_max_commit_date  timestamp with time zone;
      l_commit_log_id    integer;
begin

  if new.last_commit_id is null then
    SELECT max(CL.commit_date)
      INTO l_max_commit_date
      FROM commit_log CL, commit_log_ports CLP
     WHERE CLP.port_id = NEW.id
       AND CL.id       = CLP.commit_log_id;

    IF FOUND THEN
      SELECT CL.id
        INTO l_commit_log_id
        FROM commit_log CL, commit_log_ports CLP
       WHERE CLP.port_id    = NEW.id
         AND CL.id          = CLP.commit_log_id
         AND CL.commit_date = l_max_commit_date;

      IF FOUND THEN
        NEW.last_commit_id := l_commit_log_id;
      END IF;
    END IF;

  end if;

  RETURN NEW;

end;
' LANGUAGE 'plpgsql';


  DROP TRIGGER check_last_commit_id on ports;
CREATE TRIGGER check_last_commit_id
BEFORE update on ports
FOR EACH ROW
EXECUTE PROCEDURE check_last_commit_id();

Here is my test of that code:

freshports.org=# begin;
BEGIN
freshports.org=# select id, last_commit_id from ports where id = 234;
 id  | last_commit_id 
-----+----------------
 234 |         237699
(1 row)

freshports.org=# update ports set last_commit_id = null where id = 234;
UPDATE 1
freshports.org=# select id, last_commit_id from ports where id = 234;
 id  | last_commit_id 
-----+----------------
 234 |         237699
(1 row)

freshports.org=# ROLLBACK;
ROLLBACK
freshports.org=# 

Yes, I just set the last_commit_id to null, and it bounced right back. But for the killer test, let’s delete a commit and see what happens.

Testing via commit delete

Let’s start with this commit against chinese/bg5ps. I will check the existing values, delete the commit, then check the new values.

freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps';
  id  | last_commit_id 
------+----------------
 2973 |         504539
(1 row)

freshports.org=# begin;
BEGIN
freshports.org=# delete from commit_log where id = 504539 and message_id = '201401251811.s0PIBg6R031537@svn.freebsd.org';
DELETE 1
freshports.org=# SELECT id, last_commit_id FROM ports_active where name = 'bg5ps';
  id  | last_commit_id 
------+----------------
 2973 |         504087
(1 row)

freshports.org=# rollback;
ROLLBACK
freshports.org=# 

Yes, that seems to have worked as planned. I did a rollback at the end, just because the delete was not required.

Verifying the fix

But let’s check that new value. Is it correct?

freshports.org=# select message_id from commit_log where id = 504087;
                 message_id                  
---------------------------------------------
 201401221552.s0MFqBK6007511@svn.freebsd.org
(1 row)

freshports.org=# 

Looking at the webpage, yes, that is the expected value for that message_id, which can be found by hovering over the commit email icon or the file set icon.

I’ll run this on my dev server for a while before pushing it through to production.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

1 thought on “Fixing up a referential integrity issue”

Leave a Comment

Scroll to Top