Nov 142020
 

Today I’m working on https://devgit.freshports.org and fixing links to http://github.com/freebsd/freebsd-ports/ for individual commits.

Some of these links are stunted to: commit/c2b0677 (i.e. no hostname)

A theory

I think I know why. The repo_id field of the commit_log table is empty. This field links to the commit to a specific row in the repo table. That repo contains the hostname.

This is the repo table:

freshports.devgit=# select * from repo;
 id | name  |      description       |   repo_hostname    |      path_to_repo      | repository 
----+-------+------------------------+--------------------+------------------------+------------
  1 | ports | The FreeBSD Ports tree | svnweb.freebsd.org | /ports                 | subversion
  2 | doc   | The FreeBSD doc tree   | svnweb.freebsd.org | /doc                   | subversion
  3 | src   | The FreeBSD src tree   | svnweb.freebsd.org | /base                  | subversion
  8 | doc   | The FreeBSD doc tree   | github.com         | /freebsd/doc           | git
  9 | src   | The FreeBSD src tree   | github.com         | /freebsd/freebsd       | git
  6 | ports | The FreeBSD Ports tree | github.com         | /freebsd/freebsd-ports | git
(6 rows)

The commit_log.repo_id field is set when the commit is processed. I’m going to look at the logs for commits against https://devgit.freshports.org/www/screego/

The example

Let’s start with c2b0677.

From the logs, I found:

$this->{repo}       = 'ports-quarterly'
$this->{repository} = 'git'
'ports-quarterly' and repository = 'git')sql is insert into commit_log (id, message_id, message_date, message_subject, date_added, commit_date, 
                  committer, description, system_id, svn_revision, repo_id, encoding_losses, commit_hash_short) values ( 
                                ?,
                                ?,
                                ?,
                                ?,
                                now(),
                                ?,
                                ?,
                                ?,
                                ?,
                                ?,
                                (SELECT id FROM repo WHERE name = ? and repository = ?),
                                ?::boolean,
                                ?)

Look at lines 1 & 2, and then line 15.

That subquery will find nothing in the repo table with name = ‘ports-quarterly’ (see contents of the repo table listed above).

ports-quarterly is the name of a directory on disk:

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos] $ ls -l
total 22
drwxr-xr-x  69 freshports  freshports  84 Jun 23 22:26 PORTS-2020Q2
drwxr-xr-x   2 freshports  freshports   2 Jul 19 21:52 PORTS-2020Q2-git
drwxr-xr-x  69 freshports  freshports  84 Jul  2 10:30 PORTS-2020Q3
drwxr-xr-x  69 freshports  freshports  84 Jul 10 19:21 PORTS-head
drwxr-xr-x  25 freshports  freshports  43 Jul 14 10:51 freebsd
drwxr-xr-x  23 freshports  freshports  26 Jul 14 10:44 freebsd-doc
drwxr-xr-x  69 freshports  freshports  84 Nov 12 19:18 freebsd-ports
drwxr-xr-x  69 freshports  freshports  84 Nov 12 18:18 freebsd-ports-quarterly

Here, you can see how the given directories relate back to the FreeBSD repo.

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports] $ git config --get remote.origin.url
https://github.com/freebsd/freebsd-ports.git

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports] $ cd ../freebsd-ports-quarterly

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports-quarterly] $ git config --get remote.origin.url
https://github.com/freebsd/freebsd-ports.git
[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports-quarterly] $ 

How to fix it

I am sure that a mapping from freebsd-ports-quarterly to ports already exists in the code. I just have to find it and use it here.

Edit: I was wrong, it did not exist.

The key discovery so far:

  1. commits to ports on head have the correct link.
  2. commits to ports on the quarterly branch are not getting the repo correct.

This is promising and gives me the direction to solve it.

Just what is wrong?

Going back to the SQL:

SELECT id FROM repo WHERE name = ? and repository = ?

Those values come from: $this->{repo}, $this->{repository}

Substituting the values we have:

SELECT id FROM repo WHERE name = 'ports-quarterly' and repository = 'git'

But we want:

SELECT id FROM repo WHERE name = 'ports' and repository = 'git'

Running that query we get:

freshports.devgit=# SELECT * FROM repo WHERE name = 'ports' and repository = 'git';
 id | name  |      description       | repo_hostname |      path_to_repo      | repository 
----+-------+------------------------+---------------+------------------------+------------
  6 | ports | The FreeBSD Ports tree | github.com    | /freebsd/freebsd-ports | git
(1 row)

freshports.devgit=# 

Tracking it down in the code

Let’s look at how $this->{repo} is set.

From xml_munge_git.pm:

        $commit_log->{repo}             = $Updates{repository};

OK, that’s the ideal location to massage this data.

$Updates{repository} is the incoming value from the XML file for this commit. The data is question is:

<COMMIT Hash="c2b0677c4db6a956a8ad5ed6dfa8f066d9ea72e1" HashShort="c2b0677" Subject="Add www/screego" EncodingLoses="false" Repository="ports-quarterly"/>

That value (‘ports-quarterly‘)is defined as a constant:

$FreshPorts::Constants::Repo_Label_Ports_Quarterly = ‘ports-quarterly’;

I had to create the new relationship. I also decided to rename some constants to better reflect their name. The goal is to reduce shorten comprehension time when I next have to understand this.

%FreshPorts::Constants::RepoLabelsToGitRepoNames =
   $FreshPorts::Constants::Repo_XML_Label_Doc             => $FreshPorts::Constants::Repo_Doc,
   $FreshPorts::Constants::Repo_XML_Label_Ports           => $FreshPorts::Constants::Repo_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly => $FreshPorts::Constants::Repo_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Src             => $FreshPorts::Constants::Repo_Src,
 );

Along the way, as you can see, I renamed a number of constants to make more sense when reading them.

The solution

Here is what I have now:


# FreshPorts database repo names
# These are the names of the FreeBSD repos found within the FreshPorts database
# These the valid values in the repo.name field
#
$FreshPorts::Constants::Repo_DB_Doc                       = 'doc';
$FreshPorts::Constants::Repo_DB_Ports                     = 'ports';
$FreshPorts::Constants::Repo_DB_Src                       = 'src';


#
# These are the values to be used in the Repository field of the incoming XML files
# They reflect the different working copies of repos we are processing.
# It also helps us know what we are working on.
#
$FreshPorts::Constants::Repo_XML_Label_Doc                = 'doc';
$FreshPorts::Constants::Repo_XML_Label_Ports              = 'ports';
$FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly    = 'ports-quarterly';
$FreshPorts::Constants::Repo_XML_Label_Src                = 'src';

#
# These names relate to the directory in which we find that repo on disk.
# They were taken from the repository names found at https://github.com/freebsd/
# in July 2020. They do not need to be kept up to date. They just have to reflect
# the directories used on disk.
# Interesting fact: we don't need this. We do not need to access the repo for
# doc and src commits. We have doc and src listed to be complete.
# $ ls ~freshports/ports-jail/var/db/repos/
# PORTS-2020Q2            PORTS-2020Q3            freebsd                 freebsd-ports
# PORTS-2020Q2-git        PORTS-head              freebsd-doc             freebsd-ports-quarterly
#
$FreshPorts::Constants::Repo_Dir_Name_Doc                 = 'freebsd-doc';
$FreshPorts::Constants::Repo_Dir_Name_Ports               = 'freebsd-ports';
$FreshPorts::Constants::Repo_Dir_Name_Ports_Quarterly     = 'freebsd-ports-quarterly';
$FreshPorts::Constants::Repo_Dir_Name_Src                 = 'freebsd';
#
# How to translate the label (doc) to the repo directory (freebsd-doc)
# Well, we don't have to do this often, or at all, because we only access
# the repo for port commits, nothing else.
# This is how we relate an incoming XML file to a particular working copy of the repo.
#
%FreshPorts::Constants::GitRepos = (
   $FreshPorts::Constants::Repo_XML_Label_Doc             => $FreshPorts::Constants::Repo_Dir_Name_Doc,
   $FreshPorts::Constants::Repo_XML_Label_Ports           => $FreshPorts::Constants::Repo_Dir_Name_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly => $FreshPorts::Constants::Repo_Dir_Name_Ports_Quarterly,
   $FreshPorts::Constants::Repo_XML_Label_Src             => $FreshPorts::Constants::Repo_Dir_Name_Src,
);

#
# With the GitRepos, the repo name on disk and the label we assign for XML both do not match the
# repo we want to use.
#
# freebsd-ports and freebsd-ports-quarterly both map to the ports tree.
#
# That relationship (XML label to name in the FreshPorts repo table) is mapped here.
# The repo table knows only: doc ports src
#
# On the left, we have the incoming values in the XML file.
# On the right, we have the name we use at https://github.com/freebsd/X
#
%FreshPorts::Constants::RepoLabelsToGitRepoNames = (
   $FreshPorts::Constants::Repo_XML_Label_Doc             => $FreshPorts::Constants::Repo_DB_Doc,
   $FreshPorts::Constants::Repo_XML_Label_Ports           => $FreshPorts::Constants::Repo_DB_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly => $FreshPorts::Constants::Repo_DB_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Src             => $FreshPorts::Constants::Repo_DB_Src,
 );

Back to the original code, this is the change I need to make.

[dan@devgit-ingress01:~/modules] $ svn di xml_munge_git.pm
Index: xml_munge_git.pm
===================================================================
--- xml_munge_git.pm	(revision 5484)
+++ xml_munge_git.pm	(working copy)
@@ -550,6 +550,26 @@
 	return $myRepoPrefix;
 }
 
+sub ConvertRepoLabelToGitRepoName($) {
+#
+# Given the repo name label from XML, obtain the FreeBSD repo name.
+#
+
+	my $Repo_XML_Label = shift;
+	
+	if (!defined($Repo_XML_Label)) {
+		die('no value set for incoming Repo_XML_Label');
+	}
+	
+	my $myRepoPrefixGitRepoName = $FreshPorts::Constants::RepoLabelsToGitRepoNames{$Repo_XML_Label};
+
+	if (!defined($myRepoPrefixGitRepoName)) {
+		die("'$Repo_XML_Label' was not found in \$FreshPorts::Constants::RepoLabelsToGitRepoNames\n");
+	}
+	
+	return $myRepoPrefixGitRepoName;
+}
+
 sub handle_file_end {
 	# for svn we have:
 	#      <FILE Action="Modify" Revision="512343" Path="head/net/tightvnc/Makefile"></FILE>
@@ -927,7 +947,7 @@
 	$commit_log->{description}	= $description;
 	$commit_log->{system_id}	= $SystemID;
 	$commit_log->{commit_hash_short}= $Updates{commit_hash_short};
-	$commit_log->{repo}	 	= $Updates{repository};
+	$commit_log->{repo}	 	= ConvertRepoLabelToGitRepoName($Updates{repository});
 	$commit_log->{revision} 	= $revision;
 
 	#
[dan@devgit-ingress01:~/modules] $ 

What commits does this affect?

This affects all commits on branches. How many are this year, and therefore will be git-related?

freshports.devgit=# select count(*) from commit_log where repo_id is null and commit_date > '2020-01-01';
 count 
-------
  9103

OK. I’m not going to manually rerun all those commits. Let’s see how many are port-related:

freshports.devgit=# select count(*) from commit_log where repo_id is null and commit_date > '2020-01-01' and not exists (select commit_log_id from commit_log_ports where commit_log_id = commit_log.id);
 count 
-------
    81

Hmm, I wonder what those 81 are related to.

select CL.id, CL.message_id, element_pathname(CLE.element_id)
from commit_log CL
join commit_log_elements CLE on CL.id = CLE.commit_log_id
 where repo_id is null and commit_date > '2020-01-01' and not exists (select commit_log_id from commit_log_ports where commit_log_id = CL.id)
;
   id   |                message_id                |                 element_pathname                  
--------+------------------------------------------+---------------------------------------------------
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/plist_sub_sed_sort.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/ports_env.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/qa.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/rust-compat11-canary.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/smart_makepatch.sh
 823431 | 3bc66b9e83aeaceb8a9b6d630a5d6b449e21944b | /ports/head/java/Makefile
 823431 | 3bc66b9e83aeaceb8a9b6d630a5d6b449e21944b | /ports/head/sysutils/Makefile
 823537 | e75cc8b0d915905850369f891c0aaf00ce8a602f | /ports/head/MOVED
 823929 | 23d0237edb81d13cdf50facb6039ee41814489a9 | /ports/head/Mk/Uses/python.mk
 824028 | 6015a195a90f83435221950ae46598b354d049e3 | /ports/head/CHANGES
 824094 | da4be7ff1b25955df5d2ebb7b960c5decb874f06 | /ports/head/MOVED
...

I’m not showing all the output. This is enough to demonstrate that not all port commits affect a port. I sometimes forget that.

Yes, I can set all those commits up as ports tree commits.

I went back and checked the pre-2020 commits which had a null repo_id column. They dated back to 2002 (/CVSROOT/module) and some were as recent as 2012-09-09 (/projects/tinderbox/tinderbox.pl).

Let’s fix those up now:

freshports.devgit=# begin;
BEGIN
freshports.devgit=# update commit_log set repo_id = 6 where repo_id is null and commit_date > '2020-01-01' ;
UPDATE 9103
freshports.devgit=# select CL.id, CL.message_id, element_pathname(CLE.element_id)
from commit_log CL
join commit_log_elements CLE on CL.id = CLE.commit_log_id
 where repo_id is null and commit_date > '2020-01-01' and not exists (select commit_log_id from commit_log_ports where commit_log_id = CL.id)
;
 id | message_id | element_pathname 
----+------------+------------------
(0 rows)

freshports.devgit=# commit;
COMMIT
freshports.devgit=# 

All set. Let’s go. Commit time.

Commit at https://github.com/FreshPorts/freshports/commit/601831a2bbe87bf8467ac667bea71d75b68babd7

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive