Removing ‘unused’ tables

This post is mostly for personal use as I investigate this topic. It may take a few days and I need to keep notes.

Last week, FreshPorts went live with git commits. There are some outstanding issues, such as commits to branches, which I will catch up on in the weeks to come. During that update process, I noticed two tables which I thought were unused. I made a note for me to investigate them later. Those tables are:

freshports.devgit=# \dt commits_latest*
                List of relations
 Schema |         Name         | Type  |  Owner   
--------+----------------------+-------+----------
 public | commits_latest       | table | postgres
 public | commits_latest_ports | table | postgres
(2 rows)

freshports.devgit=# 

Intent

I recall the intent of those tables: to retain a list of the most recent commits for quick display on the home page and in news feeds. I’ve gotten better at SQL since those days and I see they contain no data:

freshports.devgit=# select count(*) from commits_latest;
 count 
-------
     0
(1 row)

freshports.devgit=# select count(*) from commits_latest_ports;
 count 
-------
     0
(1 row)

freshports.devgit=# 

What do the tables look like

For the record, they look like.

freshports.devgit=# \d commits_latest
                         Table "public.commits_latest"
       Column       |           Type           | Collation | Nullable | Default 
--------------------+--------------------------+-----------+----------+---------
 commit_log_id      | integer                  |           |          | 
 commit_date_raw    | timestamp with time zone |           |          | 
 message_subject    | text                     |           |          | 
 message_id         | text                     |           |          | 
 committer          | text                     |           |          | 
 commit_description | text                     |           |          | 
 commit_date        | text                     |           |          | 
 commit_time        | text                     |           |          | 
 element_id         | integer                  |           |          | 
 element_name       | text                     |           |          | 
 revision_name      | text                     |           |          | 
 status             | character(1)             |           |          | 
 encoding_losses    | boolean                  |           |          | 
 element_pathname   | text                     |           |          | 

freshports.devgit=# \d commits_latest_ports
                     Table "public.commits_latest_ports"
    Column     |            Type             | Collation | Nullable | Default 
---------------+-----------------------------+-----------+----------+---------
 commit_log_id | integer                     |           | not null | 
 commit_date   | timestamp without time zone |           | not null | 
Indexes:
    "commits_latest_ports_pkey" PRIMARY KEY, btree (commit_log_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (commit_log_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.devgit=# 

Does the website use it?

No, it’s not in use there:

[dan@devgit-nginx01:~/www] $ grep -ri commits_latest *
grep: configuration/robots.txt: Permission denied
grep: configuration/vhosts.conf: Permission denied
grep: configuration/vhosts.conf.nginx: No such file or directory
grep: configuration/virtualhost-common.conf: Permission denied
grep: configuration/status-config.php: No such file or directory
grep: configuration/database.php: Permission denied
docs/physical_database-2016-06-27.svg: >rel_commits_latest_ports_commit_log
docs/physical_database-2016-06-27.svg: >commits_latest
docs/physical_database-2016-06-27.svg: >commits_latest_ports
grep: www/robots.txt: Permission denied
[dan@devgit-nginx01:~/www] $ 

What about the backend?

No, it’s not there either:

[dan@devgit-ingress01:~/scripts] $ grep -ri commits_latest *
[dan@devgit-ingress01:~/scripts] $ cd ~/modules
[dan@devgit-ingress01:~/modules] $ grep -ri commits_latest *
grep: config.pm: Permission denied
[dan@devgit-ingress01:~/modules] $ 

What about the database

Perhaps it’s used by the store procedures, functions, relation integrity, etc:

[dan@pro02:~/src/freshports/database-schema-git] $ grep -ri commits_latest *
FreshPorts2.pdm:commits_latest
FreshPorts2.pdm:commits_latest
FreshPorts2.pdm:commits_latest
FreshPorts2.pdm:PK_COMMITS_LATEST
FreshPorts2.pdm:commits_latest_ports
FreshPorts2.pdm:commits_latest_ports
FreshPorts2.pdm:commits_latest_ports
createdb.sql:create table commits_latest
createdb.sql:create table commits_latest_ports
ri.txt:       WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id
ri.txt:                             FROM latest_commits_ports commits_latest_ports
ri.txt:                            WHERE (commits_latest_ports.commit_log_id = new.commit_log_id)
sp.txt:       WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id
sp.txt:                             FROM latest_commits_ports commits_latest_ports
sp.txt:                            WHERE (commits_latest_ports.commit_log_id = CommitLogID)
[dan@pro02:~/src/freshports/database-schema-git] $ 

Only ri.txt and sp.txt are relevant here:

From ri.txt:

CREATE OR REPLACE FUNCTION commit_log_ports_insert () RETURNS TRIGGER
AS '
BEGIN
   INSERT INTO latest_commits_ports (commit_log_id, commit_date)
              SELECT new.commit_log_id,
            (SELECT commit_log.commit_date
               FROM commit_log
              WHERE (commit_log.id = new.commit_log_id)
            )
       WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id
                             FROM latest_commits_ports commits_latest_ports
                            WHERE (commits_latest_ports.commit_log_id = new.commit_log_id)
                           )
                   )
             );
   RETURN new;
END;'
LANGUAGE plpgsql;

  DROP TRIGGER IF EXISTS commit_log_ports_insert ON commit_log_ports;
CREATE TRIGGER commit_log_ports_insert
AFTER INSERT ON commit_log_ports
FOR EACH ROW
EXECUTE PROCEDURE commit_log_ports_insert ();

For each row added to commit_log_ports, invoke the trigger commit_log_ports_insert().

I know commit_log_ports is used. I don’t know about latest_commits_ports.

It seems we have two more tables to review:

freshports.devgit=# \dt latest_commits*
                List of relations
 Schema |         Name         | Type  |  Owner   
--------+----------------------+-------+----------
 public | latest_commits       | table | postgres
 public | latest_commits_ports | table | postgres
(2 rows)

Those tables do seem to be used:

freshports.devgit=# select count(*) from latest_commits;
 count  
--------
 693798
(1 row)

freshports.devgit=# select count(*) from latest_commits_ports;
 count  
--------
 159794
(1 row)

From sp.txt:

CREATE OR REPLACE FUNCTION commit_log_ports_insert(int) RETURNS boolean
AS $$
DECLARE
        CommitLogID     ALIAS for $1;
BEGIN
   INSERT INTO latest_commits_ports (commit_log_id, commit_date)
              SELECT CommitLogID, 
            (SELECT commit_log.commit_date
               FROM commit_log
              WHERE (commit_log.id = CommitLogID)
            )
       WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id
                             FROM latest_commits_ports commits_latest_ports
                            WHERE (commits_latest_ports.commit_log_id = CommitLogID)
                           )
                   )
             );

   RETURN FOUND;
END;$$

There is nothing here not already covered by <span class="file">ri.txt</span>.

LANGUAGE plpgsql;

Who uses latest_commits_ports?

I found this function:

CREATE OR REPLACE FUNCTION latest_commits_ports_anchor () RETURNS integer
    AS $$
  SELECT commit_log_id AS RESULT
    FROM latest_commits_ports
ORDER BY commit_log_id
   LIMIT 1;
$$
    LANGUAGE sql 
STABLE;

I question this function.

This is used by:

  1. LatestCommitsLarge()
  2. LatestCommitsLargeFiltered()
  3. LatestCommitsSmall()
  4. But those functions do stuff like this:

                          JOIN commit_log_ports_elements CLPE on CLPE.commit_log_id = LCPCL.commit_log_id
                                                             AND CLPE.commit_log_id > latest_commits_ports_anchor()
    

The above are used by:

  1. LatestCommits()
  2. LatestCommitsFiltered()

Look at classes/latest_commits.php:

                if (IsSet($this->Filter)) {
                        $sql = "select * from LatestCommitsFiltered($this->MaxNumberOfPorts, $this->UserID, '" . pg_escape_string($this->Filter) . "')";
                } else {
#                       $sql = "select * from LatestCommits($this->MaxNumberOfPorts, $this->UserID)";
                        $sql = "
  SELECT LC.*, STF.message AS stf_message
    FROM LatestCommits(" . pg_escape_string($this->MaxNumberOfPorts) . ", 0, '" . pg_escape_string($this->BranchName) . "') LC LEFT OUTER JOIN sanity_test_failures STF
      ON LC.commit_log_id = STF.commit_log_id
ORDER BY LC.commit_date_raw DESC, LC.category, LC.port, element_pathname";
                }

This is used by commits.php and filter.php (the latter of which has no links to to and it seems it can be deleted).

I also see reference to

Actions

While the above, I keep this section up to date as I went.

  1. Examine LatestCommitsLarge() vs LatestCommitsSmall() – do we still need both?
  2. LatestCommitsLarge is the one used by commits.php – does the AND CLPE.commit_log_id > latest_commits_ports_anchor() clause have useful effect?
  3. remove filter.php
  4. drop functions LatestCommitsFiltered() & LatestCommitsLargeFiltered()
  5. remove www/caching-files directory
  6. Reimplement www/caching-files/categories.php via a backend script, perhaps periodic
  7. look at scripts/cache-refresh.sh which hasn’t been invoked in production since 2017, according to the contents of the category_stats table which is displayed on categories.php page.
  8. The CategoryStatsUpdate() function might benefit from this:
    SELECT categories.name, categories.id               AS category_id,
             count(ports_active.id)      AS count,
             max(commit_log.commit_date) AS updated,
    		 case when categories.is_primary THEN '' ELSE '*' END
        FROM ports_categories, categories, ports_active left outer join commit_log on ( ports_active.last_commit_id = commit_log.id )
       WHERE ports_active.id = ports_categories.port_id
         AND categories.id   = ports_categories.category_id
    GROUP BY categories.id, categories.name, categories.description, is_primary, categories.element_id
    order by categories.name
    
Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top