This post is mostly for personal use as I investigate this topic. It may take a few days and I need to keep notes.
Last week, FreshPorts went live with git commits. There are some outstanding issues, such as commits to branches, which I will catch up on in the weeks to come. During that update process, I noticed two tables which I thought were unused. I made a note for me to investigate them later. Those tables are:
freshports.devgit=# \dt commits_latest* List of relations Schema | Name | Type | Owner --------+----------------------+-------+---------- public | commits_latest | table | postgres public | commits_latest_ports | table | postgres (2 rows) freshports.devgit=#
Intent
I recall the intent of those tables: to retain a list of the most recent commits for quick display on the home page and in news feeds. I’ve gotten better at SQL since those days and I see they contain no data:
freshports.devgit=# select count(*) from commits_latest; count ------- 0 (1 row) freshports.devgit=# select count(*) from commits_latest_ports; count ------- 0 (1 row) freshports.devgit=#
What do the tables look like
For the record, they look like.
freshports.devgit=# \d commits_latest Table "public.commits_latest" Column | Type | Collation | Nullable | Default --------------------+--------------------------+-----------+----------+--------- commit_log_id | integer | | | commit_date_raw | timestamp with time zone | | | message_subject | text | | | message_id | text | | | committer | text | | | commit_description | text | | | commit_date | text | | | commit_time | text | | | element_id | integer | | | element_name | text | | | revision_name | text | | | status | character(1) | | | encoding_losses | boolean | | | element_pathname | text | | | freshports.devgit=# \d commits_latest_ports Table "public.commits_latest_ports" Column | Type | Collation | Nullable | Default ---------------+-----------------------------+-----------+----------+--------- commit_log_id | integer | | not null | commit_date | timestamp without time zone | | not null | Indexes: "commits_latest_ports_pkey" PRIMARY KEY, btree (commit_log_id) Foreign-key constraints: "$1" FOREIGN KEY (commit_log_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE CASCADE freshports.devgit=#
Does the website use it?
No, it’s not in use there:
[dan@devgit-nginx01:~/www] $ grep -ri commits_latest * grep: configuration/robots.txt: Permission denied grep: configuration/vhosts.conf: Permission denied grep: configuration/vhosts.conf.nginx: No such file or directory grep: configuration/virtualhost-common.conf: Permission denied grep: configuration/status-config.php: No such file or directory grep: configuration/database.php: Permission denied docs/physical_database-2016-06-27.svg: >rel_commits_latest_ports_commit_log docs/physical_database-2016-06-27.svg: >commits_latest docs/physical_database-2016-06-27.svg: >commits_latest_ports grep: www/robots.txt: Permission denied [dan@devgit-nginx01:~/www] $
What about the backend?
No, it’s not there either:
[dan@devgit-ingress01:~/scripts] $ grep -ri commits_latest * [dan@devgit-ingress01:~/scripts] $ cd ~/modules [dan@devgit-ingress01:~/modules] $ grep -ri commits_latest * grep: config.pm: Permission denied [dan@devgit-ingress01:~/modules] $
What about the database
Perhaps it’s used by the store procedures, functions, relation integrity, etc:
[dan@pro02:~/src/freshports/database-schema-git] $ grep -ri commits_latest * FreshPorts2.pdm:commits_latest FreshPorts2.pdm:commits_latest FreshPorts2.pdm:commits_latest FreshPorts2.pdm:PK_COMMITS_LATEST FreshPorts2.pdm:commits_latest_ports FreshPorts2.pdm:commits_latest_ports FreshPorts2.pdm:commits_latest_ports createdb.sql:create table commits_latest createdb.sql:create table commits_latest_ports ri.txt: WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id ri.txt: FROM latest_commits_ports commits_latest_ports ri.txt: WHERE (commits_latest_ports.commit_log_id = new.commit_log_id) sp.txt: WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id sp.txt: FROM latest_commits_ports commits_latest_ports sp.txt: WHERE (commits_latest_ports.commit_log_id = CommitLogID) [dan@pro02:~/src/freshports/database-schema-git] $
Only ri.txt and sp.txt are relevant here:
From ri.txt:
CREATE OR REPLACE FUNCTION commit_log_ports_insert () RETURNS TRIGGER AS ' BEGIN INSERT INTO latest_commits_ports (commit_log_id, commit_date) SELECT new.commit_log_id, (SELECT commit_log.commit_date FROM commit_log WHERE (commit_log.id = new.commit_log_id) ) WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id FROM latest_commits_ports commits_latest_ports WHERE (commits_latest_ports.commit_log_id = new.commit_log_id) ) ) ); RETURN new; END;' LANGUAGE plpgsql; DROP TRIGGER IF EXISTS commit_log_ports_insert ON commit_log_ports; CREATE TRIGGER commit_log_ports_insert AFTER INSERT ON commit_log_ports FOR EACH ROW EXECUTE PROCEDURE commit_log_ports_insert ();
For each row added to commit_log_ports, invoke the trigger commit_log_ports_insert().
I know commit_log_ports is used. I don’t know about latest_commits_ports.
It seems we have two more tables to review:
freshports.devgit=# \dt latest_commits* List of relations Schema | Name | Type | Owner --------+----------------------+-------+---------- public | latest_commits | table | postgres public | latest_commits_ports | table | postgres (2 rows)
Those tables do seem to be used:
freshports.devgit=# select count(*) from latest_commits; count -------- 693798 (1 row) freshports.devgit=# select count(*) from latest_commits_ports; count -------- 159794 (1 row)
From sp.txt:
CREATE OR REPLACE FUNCTION commit_log_ports_insert(int) RETURNS boolean AS $$ DECLARE CommitLogID ALIAS for $1; BEGIN INSERT INTO latest_commits_ports (commit_log_id, commit_date) SELECT CommitLogID, (SELECT commit_log.commit_date FROM commit_log WHERE (commit_log.id = CommitLogID) ) WHERE (NOT (EXISTS (SELECT commits_latest_ports.commit_log_id FROM latest_commits_ports commits_latest_ports WHERE (commits_latest_ports.commit_log_id = CommitLogID) ) ) ); RETURN FOUND; END;$$ There is nothing here not already covered by <span class="file">ri.txt</span>. LANGUAGE plpgsql;
Who uses latest_commits_ports?
I found this function:
CREATE OR REPLACE FUNCTION latest_commits_ports_anchor () RETURNS integer AS $$ SELECT commit_log_id AS RESULT FROM latest_commits_ports ORDER BY commit_log_id LIMIT 1; $$ LANGUAGE sql STABLE;
I question this function.
This is used by:
- LatestCommitsLarge()
- LatestCommitsLargeFiltered()
- LatestCommitsSmall()
But those functions do stuff like this:
JOIN commit_log_ports_elements CLPE on CLPE.commit_log_id = LCPCL.commit_log_id AND CLPE.commit_log_id > latest_commits_ports_anchor()
The above are used by:
- LatestCommits()
- LatestCommitsFiltered()
Look at classes/latest_commits.php:
if (IsSet($this->Filter)) { $sql = "select * from LatestCommitsFiltered($this->MaxNumberOfPorts, $this->UserID, '" . pg_escape_string($this->Filter) . "')"; } else { # $sql = "select * from LatestCommits($this->MaxNumberOfPorts, $this->UserID)"; $sql = " SELECT LC.*, STF.message AS stf_message FROM LatestCommits(" . pg_escape_string($this->MaxNumberOfPorts) . ", 0, '" . pg_escape_string($this->BranchName) . "') LC LEFT OUTER JOIN sanity_test_failures STF ON LC.commit_log_id = STF.commit_log_id ORDER BY LC.commit_date_raw DESC, LC.category, LC.port, element_pathname"; }
This is used by commits.php and filter.php (the latter of which has no links to to and it seems it can be deleted).
I also see reference to
Actions
While the above, I keep this section up to date as I went.
- Examine LatestCommitsLarge() vs LatestCommitsSmall() – do we still need both?
- LatestCommitsLarge is the one used by commits.php – does the AND CLPE.commit_log_id > latest_commits_ports_anchor() clause have useful effect?
- remove filter.php
- drop functions LatestCommitsFiltered() & LatestCommitsLargeFiltered()
- remove www/caching-files directory
- Reimplement www/caching-files/categories.php via a backend script, perhaps periodic
- look at scripts/cache-refresh.sh which hasn’t been invoked in production since 2017, according to the contents of the category_stats table which is displayed on categories.php page.
- The CategoryStatsUpdate() function might benefit from this:
SELECT categories.name, categories.id AS category_id, count(ports_active.id) AS count, max(commit_log.commit_date) AS updated, case when categories.is_primary THEN '' ELSE '*' END FROM ports_categories, categories, ports_active left outer join commit_log on ( ports_active.last_commit_id = commit_log.id ) WHERE ports_active.id = ports_categories.port_id AND categories.id = ports_categories.category_id GROUP BY categories.id, categories.name, categories.description, is_primary, categories.element_id order by categories.name