Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Dec 172020
 

The doc repo has moved from svn to git. This changeover occurred on 2020-12-09.

The last svn commit was: 54737

The first git commit was: 3be01a475855e7511ad755b2defd2e0da5d58bbe

To date, devgit.freshports.org has been using https://github.com/freebsd/freebsd-doc/ for processing commits.

Today’s work will convert from that GitHub repo to https://cgit.freebsd.org/doc/ (actually, https://git.freebsd.org/).

What changes are required

The following changes are required:

  1. A new working copy of the git.FreeBSD.org/doc repo
  2. A marker pointing the last commit processed
  3. Configuration file changes to point to that repo

EDIT: 2021-01-01 – NOTE: all the configuration files are also maintained via Ansible – the manual steps are not necessarily required. I think I should next document the changes to Ansible when changing database names / servers.

The rest of the post documents those changes.

A new working copy of the git.FreeBSD.org/doc repo

Let’s clone that repo:

$ cd /var/db/ingress/repos
[ingress@devgit-ingress01 ~]$ cd repos
[ingress@devgit-ingress01 ~/repos]$ ls -l
total 18
drwxr-xr-x  26 ingress  ingress  45 Dec 17 03:03 freebsd
drwxr-xr-x  23 ingress  ingress  26 Dec  8 07:12 freebsd-doc´
drwxr-xr-x  69 ingress  ingress  84 Dec 17 02:18 freebsd-ports
drwxr-xr-x  69 ingress  ingress  84 Nov 22 19:28 freebsd-ports-quarterly
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd-doc
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd-ports
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd-ports-quarterly
[ingress@devgit-ingress01 ~/repos]$ git clone https://git.FreeBSD.org/doc.git
Cloning into 'doc'...
remote: Enumerating objects: 449334, done.
remote: Counting objects: 100% (449334/449334), done.
remote: Compressing objects: 100% (120607/120607), done.
remote: Total 449334 (delta 314272), reused 448476 (delta 313566), pack-reused 0
Receiving objects: 100% (449334/449334), 245.50 MiB | 11.60 MiB/s, done.
Resolving deltas: 100% (314272/314272), done.
Updating files: 100% (11346/11346), done.
[ingress@devgit-ingress01 ~/repos]$ 

Done.

A marker pointing the last commit processed

In the directory listing above, you can see files starting with latest. which mirror the list of repos, which are directories. We need to create a new file, named latest.doc, which contains the hash of the last commit processed by FreshPorts.

To find that value, I did this:

  1. browse to https://cgit.freebsd.org/doc/
  2. click on main
  3. scroll down to “Mark the repository as being converted to Git.”
  4. Take that hash value

You won’t be able to reproduce that once there are enough commits and that particular commit scrolls off the page. You will always be able to find the first commit (3be01a475855e7511ad755b2defd2e0da5d58bbe) and see parent listed as 89d0233560e4ba181d73143fc25248b407120e09

Let’s put that into a file:

[ingress@devgit-ingress01 ~/repos]$ echo 89d0233560e4ba181d73143fc25248b407120e09 > latest.doc

What do we have now:

[ingress@devgit-ingress01 ~/repos]$ ls -l
total 21
drwxr-xr-x  23 ingress  ingress  27 Dec 17 23:55 doc
drwxr-xr-x  26 ingress  ingress  45 Dec 17 03:03 freebsd
drwxr-xr-x  23 ingress  ingress  26 Dec  8 07:12 freebsd-doc
drwxr-xr-x  69 ingress  ingress  84 Dec 17 02:18 freebsd-ports
drwxr-xr-x  69 ingress  ingress  84 Nov 22 19:28 freebsd-ports-quarterly
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.doc
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd-doc
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd-ports
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd-ports-quarterly
[ingress@devgit-ingress01 ~/repos]$ 

Next, configuration file changes.

Configuration file changes to point to that repo

What files need changing? Just one I think.

[dan@devgit-ingress01:~] $ cd /usr/local/etc/freshports
[dan@devgit-ingress01:/usr/local/etc/freshports] $ sudo grep freebsd-doc *
config.sh:     doc) dir='freebsd-doc';;

When I make this change, I want to disable the git processing so as not to change a configuration setting in the middle of processing.

$ sudo sysrc -f /etc/periodic.conf fp_check_for_git_commits_enable="NO"
fp_check_for_git_commits_enable: YES -> NO

Let’s verify nothing is running git processing:

 ps auwwx | grep ingress
ingress     60714  0.0  0.0 10676  2188  -  SCJ  00:14   0:00.00 sleep 3
ingress_svn 60727  0.0  0.0 10676  2188  -  SCJ  00:14   0:00.00 sleep 3
ingress_svn 98912  0.0  0.0 11004  2424  -  IsJ  Sat23   0:00.15 daemon: ingress_svn[98913] (daemon)
ingress_svn 98913  0.0  0.0 11868  2968  -  SJ   Sat23   0:32.78 /bin/sh /usr/local/libexec/freshports-service/ingress_svn.sh
ingress     98921  0.0  0.0 11004  2424  -  IsJ  Sat23   0:00.44 daemon: ingress[98923] (daemon)
ingress     98923  0.0  0.0 11844  2952  -  SJ   Sat23   0:21.70 /bin/sh /usr/local/libexec/freshports-service/ingress.sh
dan         60729  0.0  0.0 11384  2764  2  S+J  00:14   0:00.00 grep ingress

That’s normal. ingress_svn is the daemon checking for incoming svn commits.

ingress is the daemon which looks for incoming git commits, but that’s the daemon which processes the XML files. The periodic.conf setting we adjusted is for the creation of those XML files. That is what we are pausing now.

Next, I updated config.sh. This is what we have now:

[dan@devgit-ingress01:/usr/local/etc/freshports] $ sudo grep doc *
config.pm:# Values as found at https://www.postgresql.org/docs/current/static/libpq-ssl.html
config.pm:$FreshPorts::Config::Repo_DOC             = 'doc';
config.pm:$FreshPorts::Config::DB_Root_Prefix_DOC             = '/doc';
config.sh:# see https://www.postgresql.org/docs/12/libpq-envars.html
config.sh:     doc) dir='doc';;

That last line is the updated valud. The others refer to repo names and internal pathnames within the FreshPorts database.

What’s next?

Turn on git commit processing and watch the logs.

$ sudo sysrc -f /etc/periodic.conf fp_check_for_git_commits_enable="YES"
fp_check_for_git_commits_enable: NO -> YES
$ 

Now I wait until 7:24, because the script runs every 3 minutes.

BOOM! The latest commit is now in: 482d8311b8a1e25a66ee49af4bc7efadd8be22aa

Nov 292020
 

Some blog posts serve to help me think through to a solution. This blog post is just for that.

Today I realized the code needs to handle both git and svn. I thought I would have one cut-over date after which all commits would go through git. I see now that this isn’t the way to go. The code has to be ready to import both git and svn commits. But not from the same tree. We don’t want duplicates.

We have three repos:

  1. doc
  2. ports
  3. src

So what next?

Today, I’m going to take an XML file from dev and see if devgit can import it. I fully expect errors.

Use of uninitialized value $Updates{"commit_hash"} in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 622.
Use of uninitialized value $Updates{"FileRevision"} in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 623.
Use of uninitialized value $FileRevision in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 624.
no value set for incoming RepoName at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 555.

And we have them. Yes, there is no commit_hash in the subversion XML.

The ingress code is different, to handle a commit_hash, both long ( 1cabbda44f7f82543402b6a988976020afda2c46) and short (1cabbda).

There is no code in the git branch to handle importing svn commits.

What about keeping the two websites separate?

Let’s consider this scenario.

doc starts using git first. Then src, then ports.

Let git.freshports.org process the git commits. Let www.freshports.org process the svn commits.

When everything is transitioned to git, promote git.freshports.org to www.freshports.org

No, that won’t work. The databases are separate. The git website won’t have all the commits which were processed by the svn website….

It has to be one database

We know the git database can handle svn commits, because they are already present. We know the website can already display svn commits, because it is.

Yes, it’s just the ingress side which has to be updated now. Or rather, svn-specific code merged into the git branch.

This might be feasible if we keep svn and git processing completely separate and don’t try to make one do both.

Avoiding concurrency issues

FreshPorts has always process one commit at a time, in the order they are received. For cvs and svn, ‘order’ was defined by when the commit mailing list email was received. Commits received out of order will produce interesting results, such as a port version decreasing. There is no simple solution to that issue as far as I know.

For git, commits are again processed in order, according to however they appear in the tree, not commit date order.

But with two input streams, I’d rather avoid having two commits being processed at the same time. It is unlikely that any concurrency issues would arise, but I’d rather just avoid that.

That means two separate message queues and processing, but only one consumer of those two queues.

The svn outline

For svn, commits are processed like this:

  1. email arrives
  2. raw email is dumped into ~ingress/message-queues/incoming/2020.11.29.17.43.12.53448.txt
  3. the above is all handled by ~ingress/.mailfilter and these configuration settings in /usr/local/etc/postfix/main.cf
    mailbox_command = /usr/local/bin/maildrop -d ${USER}
    setgid_group = maildrop
    
  4. fp-daemon.sh sees 2020.11.29.17.43.12.53448.txt
  5. XML is created and dumped into 2020.11.29.17.43.12.53448.txt.xml but in ~freshports/message-queues/recent/
  6. XML is processed and loaded into the database.

The git outline

For git processing, there is no incoming email. Instead, we poll the local working copy of the git repo after a git fetch.

  1. The FreeBSD periodic system invokes /usr/local/etc/periodic/everythreeminutes/215.fp_check_git_for_commits
  2. If a new commit is found, it is extracted from the repo and a new file is created: ~ingress/message-queues/incoming/2020.10.01.19.50.02.000000.4796a64ade4267608e861f717e443c0290b73b70.xml – yes, that is a timestamp and a commit hash in that filename.
  3. The freshports daemon notices a new file in the incoming directory.
  4. XML is processed and loaded into the database.

Joining the two outlines

The solution I see is the modify both outlines so they stop at creating the XML file in different directories.

The freshports daemon then scans both directories and processes them accordingly.

The fp-daemon code (or more specifically, the code it invokes) will be modified so it only creates the XML and does not process it.

Nov 272020
 

As a sanity check, there are several diffs to compare devgit.freshports.org with dev.freshports.org and sometimes they detect a false positive.

Case in point, a recent commit to Code_Aster:

  • https://svnweb.freebsd.org/ports?view=revision&revision=556349
  • https://github.com/freebsd/freebsd-ports/commit/d23fb94b8640d1c9d38c3cafc69c89ed4fe11939

In the svnweb link, you will see:

head/science/tfel-edf/
(Copied from head/science/tfel, r555690)

It is that repo copy (i.e. svn copy) from science/tfel to science/tfel-edf which gives rise to a difference in the list of files when comparing the two commits.

  • git lists science/tfel-edfsvn does not.
  • git lists science/tfel-edf/files/patch-cmake_modules_tfel.cmakesvn does not.
  • svn lists science/tfel-edf/distinfogit does not.

The directory inclusion/omission is a direct result of how the two tools handle a copy.

patch-cmake_modules_tfel.cmake was not modified after the copy – that is why svn does not list it.

science/tfel-edf/distinfo is not in the repo, which is why svn lists it as deleted.

When a diff does the two websites do not match, I feel obliged to investigate in case the code needs to be updated. This post serves as a reminder to myself that sometimes missing files are OK.

Nov 232020
 

In the last post, I found that many commits were to the master branch when they should have been on the quarterly branch. Now I think I see why.

See this XML:

<OS Repo="ports-quarterly" Id="FreeBSD" Branch="master"/>

If it’s quarterly, it should name the branch. Case in point: 2020Q4.

I went to https://lists.freebsd.org/pipermail/svn-ports-branches/2020-November/thread.html to look for known quarterly commits.

Hmm, first, let’s find known commits at https://github.com/freebsd/freebsd-ports/tree/branches/2020Q4 – when I looked, the latest commit was https://github.com/freebsd/freebsd-ports/commit/46433baae934d92698422495b72f811839caa1a9

MFH: r555565
security/wolfssl: fix build on big-endian

Merge upstream patch to fix build on big-endian architectures.

Also unmark mips and mips64 as broken, now builds fine.

Approved by:	portmgr (fix build blanket)

The commit just before that is: https://github.com/freebsd/freebsd-ports/commit/e79616836f4e962d370f4364760d85a5e8460a65

How do did I find out? I looked at https://github.com/freebsd/freebsd-ports/commits/branches/2020Q4

When FreshPorts processes a commit, it needs a working copy of the repo as it looked at that commit.

I am trying to figure out how to do that when the commit is on a branch.

To get a copy of the branch, I do:

$ git checkout branches/2020Q4
$ git branch
  branches/2020Q3
* branches/2020Q4
  master

Next, I want the tree as it existed at commit 46433baae934d92698422495b72f811839caa1a9

i.e. https://github.com/freebsd/freebsd-ports/commit/46433baae934d92698422495b72f811839caa1a9

My first attempt is

$ git checkout 46433baae934d92698422495b72f811839caa1a9
Note: switching to '46433baae934d92698422495b72f811839caa1a9'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c 

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 46433baae934 MFH: r555565

That MFH: r555565 message indicates that I am at the right commit.

Now, if I process that commit:

echo  /usr/local/libexec/freshports/git-to-freshports-xml.py --repo ports-quarterly --path \
/usr/home/dan/src/freebsd/freebsd-ports-quarterly  --single-commit \
46433baae934d92698422495b72f811839caa1a9 --spooling /var/db/ingress/message-queues/spooling --output \
/tmp -v | sudo su -fm ingress

The important point from the above: /usr/home/dan/src/freebsd/freebsd-ports-quarterly

This is a checkout of the quarterly ports branch.

In the file I get:

$ cat /tmp/2020.11.17.16.07.01.000000.46433baae934d92698422495b72f811839caa1a9.xml
<?xml version='1.0' encoding='UTF-8'?>
<UPDATES Version="1.4.0.0">
  <UPDATE>
    <DATE Year="2020" Month="11" Day="17"/>
    <TIME Timezone="UTC" Hour="16" Minute="7" Second="1"/>
    <OS Repo="ports-quarterly" Id="FreeBSD" Branch="branches/2020Q4"/>
    <LOG>MFH: r555565

security/wolfssl: fix build on big-endian

Merge upstream patch to fix build on big-endian architectures.

Also unmark mips and mips64 as broken, now builds fine.

Approved by:	portmgr (fix build blanket)</LOG>
    <PEOPLE>
      <UPDATER Handle="pkubaj &lt;pkubaj@FreeBSD.org&gt;"/>
    </PEOPLE>
    <COMMIT Hash="46433baae934d92698422495b72f811839caa1a9" HashShort="46433ba" Subject="MFH: r555565" EncodingLoses="false" Repository="ports-quarterly"/>
    <FILES>
      <FILE Action="Modify" Path="security/wolfssl/Makefile"/>
      <FILE Action="Modify" Path="security/wolfssl/distinfo"/>
    </FILES>
  </UPDATE>
</UPDATES>
$ 

OK, that has the expected branch value for Branch, but those files are wrong. Looking on dev, I find this for the same commit:

      <FILE Action="Modify" Path="branches/2020Q4/security/wolfssl/Makefile" Revision="555566"></FILE>
      <FILE Revision="555566" Path="branches/2020Q4/security/wolfssl/distinfo" Action="Modify"></FILE>
      <FILE Revision="555566" Action="Modify" Path="branches/2020Q4/"></FILE>

Notice they all start with branches/2020Q4.

A prefix is missing. We could handle this on the XML processing side.

Lets’ see.

We have this in the XML: security/wolfssl/Makefile

We have this in debugging: File = [Modify : /ports/2020Q4/security/wolfssl/Makefile : 46433baae934d92698422495b72f811839caa1a9]

That output is produced by this code in xml_munge_git.pm:

# This is where we add in the repo name to the path
my $filename     = $DB_Root_Prefix . '/' . $Updates{branch} . '/' . $FilePath;
my $revisionname = $FileRevision;
my $commit_log_element;
        

print "File = [$FileAction : $filename";

I took about an hour to figure this out. And made code changes:

Index: xml_munge_git.pm
===================================================================
--- xml_munge_git.pm	(revision 5489)
+++ xml_munge_git.pm	(working copy)
@@ -214,8 +214,11 @@
 	$p->register(">UPDATES>UPDATE>OS:Id",                  "attr"  => \$Updates{os});
 
 	#
-	# for git, let's put branch in branch-git
-	# will will populate $Updates{branch} with the converted value. e.g. master -> head
+	# EDIT 2020-11-17 - Updates{branch_git} is the branch name supplied by git.  e.g. master, branches/20202Q4
+	# EDIT 2020-11-17 - removing all references to $Updates{branch} 
+	#
+	# for git, let's put branch in branch_git
+	# will will populate $Updates{} with the converted value. e.g. master -> head
 	# and branches/2020Q3 -> 2020Q3
 	#
 	$p->register(">UPDATES>UPDATE>OS:Branch",              "attr"  => \$Updates{branch_git});
@@ -272,7 +275,7 @@
 
 # XXX delete
 #	my $branch_name = FreshPorts::Branches::stripBranchesToGetBranchName($BranchName);
-#	$Update{branch_name} = $branch_name;
+#	$Updates{branch_name} = $branch_name;
 
 	print "OS is '$Updates{os}' : branch = '$Updates{branch_git}' for git\n";
 	
@@ -279,16 +282,42 @@
 	# When we moved from subversion to git, we needed to convert branch from
 	# master to head, because everything we need here is based on head.
 	#
-	# $Updates{branch}     : for database related actions (finding a port) e.g. head or 2020Q3
-	# $Updates{branch_git} : for repository related actions (git checkout)
+	# $Updates{branch_for_files} : for database related actions (finding a port) e.g. head or 2020Q3
+	# $Updates{branch_git}       : for repository related actions (git checkout)
+	#
+	# In the system_branch.branch_name column, we have values such as 2020Q4 and
+	# the prefix 'branches' is not included.
+	# 
+	# But for files, the prefix is included:
+	#  freshports.dev=# select * from element_pathname where pathname like '/ports/branches/2019Q3/%' limit 5;
+	#   element_id |                    pathname                     
+	#  ------------+-------------------------------------------------
+	#       960349 | /ports/branches/2019Q3/MOVED
+	#       954142 | /ports/branches/2019Q3/Mk
+	#       954143 | /ports/branches/2019Q3/Mk/Scripts
+	#       954144 | /ports/branches/2019Q3/Mk/Scripts/do-depends.sh
+	#       956066 | /ports/branches/2019Q3/Mk/Uses
+	# (5 rows)
+        # freshports.dev=# 
+        #
+        #
+        # So we have the following values:
+        #
+        # $Updates{branch_git}           - value supplied in XML
+        # $Updates{branch_database_name} - for use in system_branch.branch_name
+        # $Updates{branch_for_files}     - for use in filenames
+        #
 
-	$Updates{branch} = ConvertGitBranch($Updates{branch_git});
+        # this converts master to head, and leaves everything else unchanged
+        #
+	$Updates{branch_for_files} = ConvertGitBranchNameToFreshPortsName($Updates{branch_git});
 	
-	print "after converting '\$Updates{branch_git}' we have '$Updates{branch_git}'\n";
+	print "after converting '\$Updates{branch_git}' we have '$Updates{branch_for_files}'\n";
 	print "next we need to strip any leading 'branches/' prefix\n";
-	$Updates{branch} = FreshPorts::Branches::stripBranchesToGetBranchName($Updates{branch});
-	print "OS is '$Updates{os}' : branch = '$Updates{branch}'\n";
+	$Updates{branch_database_name} = FreshPorts::Branches::stripBranchesToGetBranchName($Updates{branch_for_files});
 	print "OS is '$Updates{os}' : branch = '$Updates{branch_git}' for git\n";
+	print "OS is '$Updates{os}' : branch = '$Updates{branch_for_files}' for git\n";
+	print "OS is '$Updates{os}' : branch = '$Updates{branch_database_name}' for database names\n";
 
 	# We know what branch this message is updating. Let's grab the IDs we will need.
 	$SystemID = SystemIDGet($Updates{os}, $self->{dbh});
@@ -297,12 +326,12 @@
 		FreshPorts::Utilities::ReportError('warning', "No SystemID found for OS = '$Updates{os}'", 1)
 	}
 
-	if ($Updates{branch} ne '') {
+	if ($Updates{branch_database_name} ne '') {
 		# we invoke GetBranchFromPathName to convert branches/2020Q3 to 2020Q3
-		$SystemBranchID = SystemBranchIDGetOrCreate($SystemID, $Updates{branch}, $self->{dbh});
+		$SystemBranchID = SystemBranchIDGetOrCreate($SystemID, $Updates{branch_database_name}, $self->{dbh});
 		if (!defined($SystemBranchID)) {
 			$! = 4;
-			FreshPorts::Utilities::ReportError('warning', "No SystemBranchID found for OS = '$Updates{branch}'", 1);
+			FreshPorts::Utilities::ReportError('warning', "No SystemBranchID found for OS = '$Updates{branch_database_name}'", 1);
 		} else {
 			$Updates{branch_id} = $SystemBranchID;
 		}
@@ -312,7 +341,7 @@
 		die   "Branch was empty.  Probably imported sources.  Ignoring message $inputfile\n";
 	}
    
-	print "OS is '$Updates{os}' ($SystemID) : branch = $Updates{branch} ($SystemBranchID)\n";
+	print "OS is '$Updates{os}' ($SystemID) : branch = $Updates{branch_database_name} ($SystemBranchID)\n";
 }
 
 
@@ -338,7 +367,7 @@
 		FreshPorts::Utilities::ReportError('Err', "No files found in commit '$Updates{commit_hash}'.  Has someone done a cvs import instead of addport?", 0)
 	}
 
-	%CommitLogPorts = FreshPorts::VerifyPort::SaveChangesToPortsTree($Updates{branch}, commit_log_id(), \@Files, $self->{dbh});
+	%CommitLogPorts = FreshPorts::VerifyPort::SaveChangesToPortsTree($Updates{branch_database_name}, commit_log_id(), \@Files, $self->{dbh});
 
 	#
 	# commit what we have now, and that starts a new transaction.
@@ -429,7 +458,9 @@
 
 	# we don't clear these values until the end of the update
 	undef $Updates{os};
-	undef $Updates{branch};
+	undef $Updates{branch_git};
+	undef $Updates{branch_database_name};
+	undef $Updates{branch_for_files};
 	undef $Updates{committerAll};
 	undef $Updates{dateyear};
 	undef $Updates{datemonth};
@@ -472,7 +503,7 @@
 	return $ValidFileActions{$FileAction};
 }
 
-sub ConvertGitBranch($) {
+sub ConvertGitBranchNameToFreshPortsName($) {
 	my $GitBranch = shift;
 	
 	#
@@ -610,7 +641,7 @@
 	my $element;
 	my $element_id;
 	# This is where we add in the repo name to the path
-	my $filename     = $DB_Root_Prefix . '/' . $Updates{branch} . '/' . $FilePath;
+	my $filename     = $DB_Root_Prefix . '/' . $Updates{branch_for_files} . '/' . $FilePath;
 	my $revisionname = $FileRevision;
 	my $commit_log_element;
 	
@@ -808,11 +839,12 @@
 	# The criteria for that is the subject must start with
 	# "cvs commit: ports/".
 
-	print "OS             = [$Updates{os}]\n";
-	print "Branch git     = [$Updates{branch_git}]\n";
-	print "Branch         = [$Updates{branch}]\n";
-	print "Committer      = [$Updates{committerAll}]\n";
-	print "Date           = [" . sprintf "%04u/%02u/%02u %02u:%02u:%02u %s", $Updates{dateyear}, $Updates{datemonth}, $Updates{dateday}, $Updates{timehour}, $Updates{timeminute}, $Updates{timesecond}, $Updates{timezone} . "]\n";
+	print "OS                   = [$Updates{os}]\n";
+	print "Branch git           = [$Updates{branch_git}]\n";
+	print "branch_database_name = [$Updates{branch_database_name}]\n";
+	print "branch_for_files     = [$Updates{branch_for_files}]\n";
+	print "Committer            = [$Updates{committerAll}]\n";
+	print "Date                 = [" . sprintf "%04u/%02u/%02u %02u:%02u:%02u %s", $Updates{dateyear}, $Updates{datemonth}, $Updates{dateday}, $Updates{timehour}, $Updates{timeminute}, $Updates{timesecond}, $Updates{timezone} . "]\n";
 	if (defined($Updates{repository})) {
 		print "Repository     = [$Updates{repository}]\n";
 	} else {
@@ -835,7 +867,7 @@
 
 	# First thing we must do, is tell the database what Branch to use...
 	# XXX why does this not use branches::SetBranchInDB() ?
-	my $sql = 'select freshports_branch_set(' . $self->{dbh}->quote($Updates{branch}) . ')';
+	my $sql = 'select freshports_branch_set(' . $self->{dbh}->quote($Updates{branch_database_name}) . ')';
 	my $sth = $self->{dbh}->prepare($sql);
 	if (!$sth->execute())  {
 		FreshPorts::Utilities::ReportError('warning', "Could not set branch", 1);
[dan@devgit-ingress01:~/modules] $ 

But wait, there’s more!

There were more things wrong too. I was going git checkout master on branches, which meant all the branch commits were being processed as if they were on master. Bad. :/

Some of that came out in git commit processing – how is it done? which was published after I started this blog post, but before I finished it.

Nov 222020
 

I need to document this so I can refer to it while debugging.

This follows the chain of scripts which processes a commit.

Periodic

FreshPorts, at present, checks for new commits every three minutes, via this entry in /etc/crontab:

*/3	*	*	*	*	root	periodic everythreeminutes

That will invoke this script:

$ cat /usr/local/etc/periodic/everythreeminutes/215.fp_check_git_for_commits 
#!/bin/sh -
#
# FreshPorts periodic script
#
# Checks to see if there are any new commits waiting
#

# If there is a global system configuration file, suck it in.
#
if [ -r /etc/defaults/periodic.conf ]
then
    . /etc/defaults/periodic.conf
    source_periodic_confs
fi

# assign default values
fp_scripts_dir=${fp_scripts_dir:-/usr/local/libexec/freshports}

case "$fp_check_for_git_commits_enable" in
	[Yy][Ee][Ss])
	logger -p local3.notice -t FreshPorts "into $0"
	echo ""
	cd $fp_scripts_dir && ./helper_scripts/check_for_git_commits.sh || rc=3
	;;
        
    *)  rc=0;;
esac

exit $rc

check_for_git_commits.sh

This is a cheat.

$ cat /usr/local/libexec/freshports/helper_scripts/check_for_git_commits.sh
#!/bin/sh

logger -t check_for_git_commits.sh -p local4.notice "touching ~ingress/signals/check_git ~ingress/signals/job_waiting"
echo touch ~ingress/signals/check_git ~ingress/signals/job_waiting | sudo su -fm ingress
logger -t check_for_git_commits.sh -p local4.notice "done touching, going away now"

It touches a file, which is a signal for the ingress daemon.

ingress.sh

$ cat /usr/local/libexec/freshports-service/ingress.sh
#!/bin/sh
#
# $Id: fp-daemon.sh,v 1.17 2006-11-10 14:08:26 dan Exp $
#
# Copyright (c) 2001-2003 DVL Software
#
#
# include our local parameters

. /usr/local/etc/freshports/ingress.sh

CP='/bin/cp'

# we do not use -i because that would fail when re re-run a commit
MV='/bin/mv'

RM='/bin/rm'

PERL='/usr/local/bin/perl'

#
# sanity checking upon startup
#

check_for_jobs() {
	#
	# This flag file is only set by a job run by this script.
	# A race condition should never arise.
	#
	FLAG="${INGRESS_FLAGDIR}/job_waiting"
	if [ -f ${FLAG} ]
	then
		cd ${SCRIPTDIR}
		echo "yes, there is a job waiting"
		echo "running ${PERL} ./job-waiting.pl"
		echo "from directory  ${SCRIPTDIR}"
		ls -l ./job-waiting.pl
		${PERL} ./job-waiting.pl
		if [ $? -eq 0 ]
		then
			echo "job-waiting.pl finishes normally"
		else
			echo "FATAL job-waiting.pl finished with an error: $?"
		fi
		rm ${FLAG}
	fi
}

echo "starting up!"

if [ ! -d ${SCRIPTDIR} ]
then
	echo "Required directory does not exist: ${SCRIPTDIR}"
	exit
fi

if [ ! -d ${INGRESS_MSGDIR}/incoming ]
then
	echo "Required directory does not exist: ${INGRESS_MSGDIR}/incoming"
	exit
fi

echo incoming: ${INGRESS_MSGDIR}/incoming
echo ready

while :
	do
	cd ${SCRIPTDIR}

	INCOMING=${INGRESS_MSGDIR}/incoming

	if [ -e 'OFFLINE' ]
	then
		echo "system is OFFLINE: ${SCRIPTDIR}/OFFLINE exists"
		break
	else
		check_for_jobs
	fi
	sleep 3
done

That script checks for files in the incoming queue. More on that, perhaps later.

Then, if not OFFLINE, it checks for waiting jobs.

job-waiting.pl

$ cat /usr/local/libexec/freshports/job-waiting.pl
#!/usr/local/bin/perl -w
#
# $Id: job-waiting.pl,v 1.3 2007-01-29 00:17:35 dan Exp $
#
# Copyright (c) 1999-2007 DVL Software
#
# This script is invoked by the fp-freshports.sh script
# usually located in /var/services/freshports
#

use strict;

use DBI;
use FreshPorts::database;
use FreshPorts::cache;
use FreshPorts::commit_log_ports_ignore;
use FreshPorts::system_status;
use FreshPorts::utilities;

# added in for testing
require Sys::Syslog;

FreshPorts::Utilities::InitSyslog();

#die('we are done here - stopped');

Sys::Syslog::syslog('warning', "running job-waiting.pl");


my %Jobs_ingress = (
	$FreshPorts::Config::CheckGit                 => 'check_git.sh',
	);

my %Jobs_freshports = (
	$FreshPorts::Config::MovedFileFlag            => 'process_moved.sh',
	$FreshPorts::Config::NewReposReadyForImport   => 'import_packagesite.py',
	$FreshPorts::Config::NewRepoImported          => 'UpdatePackagesFromRawPackages.py',
	$FreshPorts::Config::UpdatingFileFlag         => 'process_updating.sh',
	$FreshPorts::Config::VuXMLFileFlag            => 'process_vuxml.sh',
	$FreshPorts::Config::WWWENPortsCategoriesFlag => 'process_www_en_ports_categories.sh',
	);

FreshPorts::Utilities::Report('notice', "starting $0");

#	
# This script is invoked by either the freshports or the ingress user
# they have separate lists of jobs to look for. Rather than maintain two
# scripts, there is one.
#
my $username = getpwuid($<);
my %Jobs;

FreshPorts::Utilities::Report('notice', "running $0 as user = '$username'");

if ($username eq 'freshports') {
   %Jobs = %Jobs_freshports;
} elsif ($username eq 'ingress') {
   %Jobs = %Jobs_ingress;
} else {
  FreshPorts::Utilities::Report('notice', "WHO IS THAT USER? I don't know them. Stopping.");
  die($0 . ' must be run only as the ingress or freshports users');
  exit;
}

	
FreshPorts::Utilities::Report('notice', "checking jobs for $username");

my $JobFound;
do {
	$JobFound = 0;
	# one job might create another, so we keeping looping until they are all cleared.
	while (my ($flag, $script) = each %Jobs) {
		if (-f $flag) {
			$JobFound =1;
			FreshPorts::Utilities::Report('notice', "$flag exists.  About to run $script");
			`$FreshPorts::Config::scriptpath/$script`;
			FreshPorts::Utilities::Report('notice', "Finished running $script");
		} else {
			FreshPorts::Utilities::Report('notice', "flag '$flag' not set.  no work for $script");
		}
	}
} until (!$JobFound);

In there, we find that check_git.sh is invoked.

check_git.sh

$ cat /usr/local/libexec/freshports/check_git.sh
#!/bin/sh

# This script exists mainly to redirect the output of git-delta.sh to a logfile.
#

if [ ! -f /usr/local/etc/freshports/config.sh ]
then
	echo "/usr/local/etc/freshports/config.sh not found by $0"
	exit 1
fi

. /usr/local/etc/freshports/config.sh

LOGGERTAG=check_git.sh

${LOGGER} -t ${LOGGERTAG} $0 has started

# redirect everything into the file
${SCRIPTDIR}/git-delta.sh "doc ports ports-quarterly src" >> ${GITLOG} 2>&1

/bin/rm ${CHECKGITFILE}

${LOGGER} -t ${LOGGERTAG} $0 has finished

git-delta.sh

$ cat /usr/local/libexec/freshports/git-delta.sh
#!/bin/sh

# process the new commits
# based upon https://github.com/FreshPorts/git_proc_commit/issues/3
# An idea from https://github.com/sarcasticadmin

if [ ! -f /usr/local/etc/freshports/config.sh ]
then
	echo "/usr/local/etc/freshports/config.sh.sh not found by $0"
	exit 1
fi

# this can be a space separated list of repositories to check
# e.g. "doc ports src"
repos=$1

. /usr/local/etc/freshports/config.sh

LOGGERTAG='git-delta.sh'

logfile "has started. Will check these repos: '${repos}'"

# what remote are we using on this repo?
REMOTE='origin'

# where we do dump the XML files which we create?
XML="${INGRESS_MSGDIR}/incoming"

logfile "XML dir is $XML"

for repo in ${repos}
do
   logfile "Now processing repo: ${repo}"

   # convert the repo label to a physical directory on disk
   dir=`convert_repo_label_to_directory ${repo}`

   # empty means error
   if [  "${dir}" == "" ]; then
      logfile "FATAL error, repo='${repo}' is unknown: cannot translate it to a directory name"
      continue
   fi

   # where is the repo directory?
   # This is the directory which contains the repos.
   REPODIR="${INGRESS_PORTS_DIR_BASE}/${dir}"
   LATEST_FILE="${INGRESS_PORTS_DIR_BASE}/latest.${dir}"

   if [ -d ${REPODIR} ]; then
      logfile "REPODIR='${REPODIR}' exists"
   else
      logfile "FATAL error, REPODIR='${REPODIR}' is not a directory"
      continue
   fi

   if [ -f ${LATEST_FILE} ]; then
      logfile "LATEST_FILE='${LATEST_FILE}' exists"
   else
      logfile "FATAL error, LATEST_FILE='${LATEST_FILE}' does not exist. We need a starting point."
      continue
   fi

   logfile "Repodir is $REPODIR"
   # on with the work

   cd ${REPODIR}

   # Update local copies of remote branches
#   logfile "Running: ${GIT} fetch $REMOTE:"
#   ${GIT} fetch $REMOTE
#   logfile "Done."

#   logfile "Running: ${GIT} checkout master:"
#   ${GIT} checkout master
#   logfile "Done."

   logfile "Running: ${GIT} pull:"
   ${GIT} pull
   logfile "Done."

   # let's try having the latest commt in this this.
   STARTPOINT=`cat ${LATEST_FILE}`

   if [ "${STARTPOINT}x" = 'x' ]
   then
      logfile "STARTPOINT is empty; there must not be any new commits to process"
      logfile "Not proceeding with this repo: '${repo}'"
      continue
   else
      logfile "STARTPOINT = ${STARTPOINT}"
   fi

   # Bring local branch up-to-date with the local remote
#   logfile "Running; ${GIT} rebase $REMOTE/master:"
#   ${GIT} rebase $REMOTE/master
#   logfile "Running; ${GIT} fetch:"
#   ${GIT} fetch
#   logfile "Done."


   # get list of commits, if only to document them here
   logfile "Running: ${GIT} rev-list ${STARTPOINT}..HEAD"
   commits=`${GIT} rev-list ${STARTPOINT}..HEAD`
   logfile "Done."

   if [ -z "commits" ]
   then
     logfile "No commits were found"
   else
     logfile "The commits found are:"
     for commit in $commits
     do
        logfile "$commit"
     done
   fi

   logfile "${SCRIPTDIR}/git-to-freshports-xml.py --repo ${repo} --path ${REPODIR} --commit ${STARTPOINT} --spooling ${INGRESS_SPOOLINGDIR} --output ${XML}"
            ${SCRIPTDIR}/git-to-freshports-xml.py --repo ${repo} --path ${REPODIR} --commit ${STARTPOINT} --spooling ${INGRESS_SPOOLINGDIR} --output ${XML}
         
   new_latest=`${GIT}  rev-parse HEAD`
   echo $new_latest > ${LATEST_FILE}

done

logfile "Ending"

git-to-freshports-xml.py creates the XML files which are placed into the incoming queue (at ~ingress/message-queues/incoming).

The files are noticed by the freshports daemon (running as /usr/local/libexec/freshports-service/freshports.sh).

Nov 182020
 

A recent post on the FreeBSD Ports mailing list asked:

Hi,

I noticed a big difference between the number of ports on
freebsd.org/ports/ and on freshports.org. Currently, it’s 33348 vs.
41346.

The freebsd.org’s number equals roughly the number of lines of a current
INDEX, but how does FreshPorts count?

Best,
Moritz

In short, they are both wrong.

The FreeBSD value is based on INDEX, which includes flavors. The counts on the webpages under https://www.freebsd.org/ports/ will list some ports multiple times. See below for examples.

The FreshPorts total is wrong because it is including ports on branches.

The real number of ports is in the 28,800 range.

It is debatable whether py27-atspi and py37-atspi should be listed as separate ports. There are separate packages, yes, but they are both generated from one port: accessibility/py-atspi.

The rest of this post has background on how I reached these values.

Where is this FreshPorts count?

In the Statistics box on the right hand side of FreshPorts, you will see:

Statistics box saying Port Count 41418

Statistics box saying Port Count 41418

Let’s see where this value comes from.

FreshPorts count

Everything in the Statistics box is generated by the backend via a periodic job. Let’s grep the code and find out where:

[dan@dev-ingress01:~/scripts] $ grep -r 'Calculated hourly' *
hourly_stats.pl:		print FILE '<BR>Calculated hourly:<BR>';

If I look in there, I find: select Stats_PortCount()

Going to the sp.txt file, I find this stored procedure:

CREATE OR REPLACE FUNCTION Stats_PortCount() returns int8 AS $$
        DECLARE
                PortCount       int8;

        BEGIN
                SELECT count(*)
                  INTO PortCount
                  FROM ports, element
                 WHERE element.status = 'A'
                   AND ports.element_id = element.id;

                return PortCount;
        END
$$ LANGUAGE 'plpgsql';

Let’s run that query:

freshports.org=# select * from Stats_PortCount();
 stats_portcount 
-----------------
           41418
(1 row)

freshports.org=# 

FreshPorts count with branches

I know why this values is so far from the FreeBSD count. Branches. Let’s look at this output where I start pulling back the port names:

freshports.org=# SELECT EP.pathname
freshports.org-#                   FROM ports P , element E, element_pathname EP
freshports.org-#                  WHERE E.status = 'A'
freshports.org-#                    AND P.element_id = E.id
freshports.org-#                    AND E.id = EP.element_id
freshports.org-#                  ORDER BY EP.pathname LIMIT 10;
                     pathname                     
--------------------------------------------------
 /ports/branches/2016Q2/math/blitz++
 /ports/branches/2016Q4/archivers/file-roller
 /ports/branches/2016Q4/archivers/p7zip
 /ports/branches/2016Q4/archivers/p7zip-codec-rar
 /ports/branches/2016Q4/archivers/php56-bz2
 /ports/branches/2016Q4/archivers/php56-phar
 /ports/branches/2016Q4/archivers/php56-zip
 /ports/branches/2016Q4/archivers/php56-zlib
 /ports/branches/2016Q4/archivers/php70-bz2
 /ports/branches/2016Q4/archivers/php70-phar
(10 rows)

freshports.org=# 

FreshPorts count without branches

Let’s try the query and ignore branches.

freshports.org=# SELECT EP.pathname
freshports.org-#   FROM ports P , element E, element_pathname EP
freshports.org-#  WHERE E.status = 'A'
freshports.org-#    AND P.element_id = E.id
freshports.org-#    AND E.id = EP.element_id
freshports.org-#    AND EP.pathname NOT LIKE '/ports/branches/%'
freshports.org-#  ORDER BY EP.pathname desc
freshports.org-#  LIMIT 10;
          pathname           
-----------------------------
 /ports/head/x11/zenity
 /ports/head/x11/yelp
 /ports/head/x11/yeahconsole
 /ports/head/x11/yalias
 /ports/head/x11/yakuake
 /ports/head/x11/yad
 /ports/head/x11/xzoom
 /ports/head/x11/xxkb
 /ports/head/x11/xwud
 /ports/head/x11/xwit
(10 rows)

freshports.org=# 

That looks better.

Let’s get a count now.

freshports.org=# SELECT count(*)
freshports.org-#   FROM ports P , element E, element_pathname EP
freshports.org-#  WHERE E.status = 'A'
freshports.org-#    AND P.element_id = E.id
freshports.org-#    AND E.id = EP.element_id
freshports.org-#    AND EP.pathname NOT LIKE '/ports/branches/%';
 count 
-------
 28759
(1 row)

freshports.org=# 

Well, that’s not great either.

That can’t be right

Let’s suspect the element_pathname table and remove it from the query. Instead, I will create the pathname based on a function:

freshports.org=# 
freshports.org=# SELECT count(*) FROM (
freshports.org(# SELECT element_pathname(E.id) as pathname
freshports.org(#   FROM ports P , element E
freshports.org(#  WHERE E.status = 'A'
freshports.org(#    AND P.element_id = E.id) AS tmp
freshports.org-# WHERE pathname NOT LIKE '/ports/branches/%';
 count 
-------
 28759
(1 row)

freshports.org=# 

That matches the count via the element_pathname table.

So it’s not that table skewing the results. What is it then?

Looking at category counts

Let’s compare https://www.freebsd.org/ports/categories-alpha.html with FreshPorts.

Let’s start with this query on the port_active table, which is actually a view of non-deleted ports.

  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%' limit 10;
  category  |            name            |                  pathname                  
------------+----------------------------+--------------------------------------------
 textproc   | rubygem-raabro             | /ports/head/textproc/rubygem-raabro
 biology    | pyfasta                    | /ports/head/biology/pyfasta
 math       | symmetrica                 | /ports/head/math/symmetrica
 java       | sigar                      | /ports/head/java/sigar
 databases  | phpmyadmin5                | /ports/head/databases/phpmyadmin5
 devel      | rubygem-rbtrace            | /ports/head/devel/rubygem-rbtrace
 x11        | xfce4-screenshooter-plugin | /ports/head/x11/xfce4-screenshooter-plugin
 science    | hdf5-18                    | /ports/head/science/hdf5-18
 lang       | nhc98                      | /ports/head/lang/nhc98
 multimedia | xanim                      | /ports/head/multimedia/xanim
(10 rows)

Now, it’s get count by category.

freshports.dev=# SELECT PA.category, count(PA.name)
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
 GROUP BY PA.category
 ORDER BY PA.category;
   category    | count 
---------------+-------
 accessibility |    26
 arabic        |     8
 archivers     |   258
 astro         |   124
 audio         |   877
 base          |     1
 benchmarks    |   100
 biology       |   176
 cad           |   126
 chinese       |   106
 comms         |   213
 converters    |   178
 databases     |  1033
 deskutils     |   261
 devel         |  6875
 dns           |   238
 editors       |   263
 emulators     |   177
 finance       |   113
 french        |    14
 ftp           |    96
 games         |  1133
 german        |    21
 graphics      |  1128
 hebrew        |     7
 hungarian     |     7
 irc           |   114
 japanese      |   280
 java          |   122
 korean        |    39
 lang          |   364
 mail          |   709
 math          |   970
 misc          |   533
 multimedia    |   457
 net           |  1563
 net-im        |   176
 net-mgmt      |   404
 net-p2p       |    94
 news          |    67
 polish        |    14
 ports-mgmt    |    67
 portuguese    |     9
 print         |   256
 russian       |    32
 science       |   340
 security      |  1313
 shells        |    56
 sysutils      |  1538
 textproc      |  1896
 ukrainian     |     9
 vietnamese    |    16
 www           |  2358
 x11           |   534
 x11-clocks    |    42
 x11-drivers   |    44
 x11-fm        |    30
 x11-fonts     |   250
 x11-servers   |    10
 x11-themes    |   145
 x11-toolkits  |   240
 x11-wm        |   119
(62 rows)

freshports.dev=# 

Primary categories vs secondary categories

Remember that some categories are virtual, and do not appear on disk. The above count are only for primary categories, those which do appear on disk. For example, afterstep is not listed above, but you’ll find it in the FreeBSD list. The above SQL is for primary categories only. Virtual categories are covered in FreshPorts, but it’s not relevant to our search.

Also, a port exists on disk only within its primary category. There may be secondary categories, but the port should not be counted there as well. A port should only be counted once.

Picking on Hungarian

Let’s pick Hungarian, which has a small number of ports.

freshports.dev=# SELECT PA.category, PA.name, EP.pathname
freshports.dev-#   FROM ports_active PA, element_pathname EP
freshports.dev-#  WHERE PA.element_id = EP.element_id
freshports.dev-#    AND EP.pathname NOT LIKE '/ports/branches/%'
freshports.dev-#    AND PA.category = 'hungarian'
freshports.dev-#  ORDER BY EP.pathname
freshports.dev-# LIMIT 10;
 category  |           name           |                    pathname                    
-----------+--------------------------+------------------------------------------------
 hungarian | aspell                   | /ports/head/hungarian/aspell
 hungarian | hunspell                 | /ports/head/hungarian/hunspell
 hungarian | hyphen                   | /ports/head/hungarian/hyphen
 hungarian | jdictionary-eng-hun      | /ports/head/hungarian/jdictionary-eng-hun
 hungarian | jdictionary-eng-hun-expr | /ports/head/hungarian/jdictionary-eng-hun-expr
 hungarian | libreoffice              | /ports/head/hungarian/libreoffice
 hungarian | mythes                   | /ports/head/hungarian/mythes
(7 rows)

freshports.dev=# 

Let’s compare that with what is on disk:

[dan@pkg01:~/ports/head/hungarian] $ ls -l
total 13
-rw-r--r--  1 dan  dan  332 May  5  2020 Makefile
-rw-r--r--  1 dan  dan   97 Oct 27  2019 Makefile.inc
drwxr-xr-x  2 dan  dan    6 Oct 27  2019 aspell
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 hunspell
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 hyphen
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 jdictionary-eng-hun
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 jdictionary-eng-hun-expr
drwxr-xr-x  2 dan  dan    4 Nov 12 15:09 libreoffice
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 mythes
[dan@pkg01:~/ports/head/hungarian] $ svn info

Don’t trust me. Look at subversion for ports/head/hungarian/

The FreshPorts count is correct. What is FreeBSD talking about then?

Comparing with https://www.freebsd.org/ports/hungarian.html, I see that FreeBSD is including:

Will this account for the differences? I don’t know.

The 33399 count listed at https://www.freebsd.org/ports/ (on 2020-11-18) seems close the value contained within INDEX-12 (33406).

The category totals at https://www.freebsd.org/ports/categories-grouped.html include ports listed in their secondary categories. This counts some ports more than once.

Looking at INDEX

Let’s look at INDEX-12:

[dan@pkg01:~/ports/head] $ make fetchindex
/usr/bin/env  fetch -am -o /usr/home/dan/ports/head/INDEX-12.bz2 https://www.FreeBSD.org/ports/INDEX-12.bz2
/usr/home/dan/ports/head/INDEX-12.bz2                 2315 kB 1436 kBps    02s
[dan@pkg01:~/ports/head] $ 


[dan@pkg01:~/ports/head] $ wc -l INDEX-12 
   33406 INDEX-12

[dan@pkg01:~/ports/head] $ grep -c jdictionary-ger-hun INDEX-12 
1

OK, it’s only counted once within INDEX.

So far, we know why the port counts on the web pages differ.

Let’s pick a category which is not language related: x11-servers

This is what FreshPorts has:

freshports.org=# SELECT PA.category, PA.name, EP.pathname
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
   AND PA.category = 'x11-servers'
 ORDER BY EP.pathname;
  category   |      name       |                pathname                 
-------------+-----------------+-----------------------------------------
 x11-servers | Xfstt           | /ports/head/x11-servers/Xfstt
 x11-servers | x2vnc           | /ports/head/x11-servers/x2vnc
 x11-servers | x2x             | /ports/head/x11-servers/x2x
 x11-servers | xephyr          | /ports/head/x11-servers/xephyr
 x11-servers | xorg-dmx        | /ports/head/x11-servers/xorg-dmx
 x11-servers | xorg-nestserver | /ports/head/x11-servers/xorg-nestserver
 x11-servers | xorg-server     | /ports/head/x11-servers/xorg-server
 x11-servers | xorg-vfbserver  | /ports/head/x11-servers/xorg-vfbserver
 x11-servers | xwayland        | /ports/head/x11-servers/xwayland
 x11-servers | xwayland-devel  | /ports/head/x11-servers/xwayland-devel
(10 rows)

freshports.org=# 

From disk:

[dan@pkg01:~/ports/head/x11-servers] $ ls -l
total 34
-rw-r--r--  1 dan  dan  375 Feb 14  2020 Makefile
drwxr-xr-x  3 dan  dan    7 Sep 19 01:19 Xfstt
drwxr-xr-x  2 dan  dan    5 Nov  9  2019 x2vnc
drwxr-xr-x  3 dan  dan    6 Nov  9  2019 x2x
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xephyr
drwxr-xr-x  2 dan  dan    5 Feb 25  2020 xorg-dmx
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xorg-nestserver
drwxr-xr-x  3 dan  dan    8 Sep 19 01:19 xorg-server
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xorg-vfbserver
drwxr-xr-x  2 dan  dan    4 Oct 11 13:22 xwayland
drwxr-xr-x  2 dan  dan    5 Nov 16 15:28 xwayland-devel
[dan@pkg01:~/ports/head/x11-servers] $ 

That matches.

Looking at https://www.freebsd.org/ports/x11-servers.html I find listings not found above:

  1. tigervnc-server net/tigervnc-server
  2. tigervnc-viewer net/tigervnc-viewer
  3. xorg-minima x11/xorg-minimal

Again, it is ports listed here, which are not actually in this category. Ports are being counted twice, at least in the web page.

This extracts the list of ports from INDEX:

[dan@pkg01:~/ports/head] $ cut -f 2 -d '|' INDEX-12 > ~/tmp/INDEX-12-list
[dan@pkg01:~/ports/head] $ head -4  ~/tmp/INDEX-12-list
/usr/ports/accessibility/accerciser
/usr/ports/accessibility/at-spi2-atk
/usr/ports/accessibility/at-spi2-core
/usr/ports/accessibility/atkmm
[dan@pkg01:~/ports/head] $ 
[dan@pkg01:~/ports/head] $ wc -l INDEX-12  ~/tmp/INDEX-12-list
   33406 INDEX-12
   33406 /usr/home/dan/tmp/INDEX-12-list
   66812 total

The line count matches. Let’s get the same information out of FreshPorts, but this time, I’ll use production.

cat << EOF | psql -t freshports.org > INDEX.FreshPorts
SELECT '/usr/ports/' || PA.category || '/' || PA.name
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
 ORDER BY 1;
EOF

We have 28759 entries there.

$ wc -l ~/INDEX.FreshPorts 
   28759 /usr/home/dan/INDEX.FreshPorts

That is far from the 33406 lines in INDEX-12.

Removing flavors from INDEX-12 list of ports

When I started comparing the output, I noticed that INDEX-12 listed accessibility/py-atspi twice. Why? Because of flavors. Here are the first two columns from INDEX-12:

py27-atspi-2.38.0|/usr/ports/accessibility/py-atspi
py37-atspi-2.38.0|/usr/ports/accessibility/py-atspi

Let’s remove duplicate lines from INDEX-12:

[dan@pkg01:~/ports/head] $ wc -l INDEX-12  ~/tmp/INDEX-12-list ~/tmp/INDEX-12-list-nodups
   33406 INDEX-12
   33406 /usr/home/dan/tmp/INDEX-12-list
   28755 /usr/home/dan/tmp/INDEX-12-list-nodups
   95567 total
[dan@pkg01:~/ports/head] $ 

That means 4651 lines relate directly to flavors.

That uniq output is much closer to the FreshPorts count of 28759. It is off by 4.

Comparing INDEX-12 and FreshPorts

Let’s do a diff.

All the + lines indicates a port included in FreshPorts, but not INDEX-12. I have annotated the output to indicate what my investigations found.

All the lines indicate something not found on FreshPorts.

When you see DELETED, that means FreshPorts has marked this port was deleted.

[dan@pkg01:~/ports/head] $ diff -ruN ~/tmp/INDEX-12-list-nodups ~/INDEX.FreshPorts
--- /usr/home/dan/tmp/INDEX-12-list-nodups	2020-11-18 16:58:12.360853000 +0000
+++ /usr/home/dan/INDEX.FreshPorts	2020-11-18 17:49:49.321133000 +0000
@@ -1290,6 +1290,7 @@
 /usr/ports/audio/zita-resampler
 /usr/ports/audio/zrythm
 /usr/ports/audio/zynaddsubfx
+/usr/ports/base/binutils NOT A PORT
 /usr/ports/benchmarks/ali
 /usr/ports/benchmarks/apib
 /usr/ports/benchmarks/autobench
@@ -7405,7 +7406,6 @@
 /usr/ports/devel/php80-sysvsem
 /usr/ports/devel/php80-sysvshm
 /usr/ports/devel/php80-tokenizer
-/usr/ports/devel/phpunit6 PORT MARKED AS DELETED
 /usr/ports/devel/phpunit7
 /usr/ports/devel/phpunit8
 /usr/ports/devel/physfs
@@ -7703,6 +7703,7 @@
 /usr/ports/devel/py-cachy
 /usr/ports/devel/py-canonicaljson
 /usr/ports/devel/py-capstone
+/usr/ports/devel/py-case NEWLY CREATED
 /usr/ports/devel/py-castellan
 /usr/ports/devel/py-castellan1
 /usr/ports/devel/py-cbor
@@ -9873,6 +9874,7 @@
 /usr/ports/devel/rubygem-rspec-support
 /usr/ports/devel/rubygem-rspec_junit_formatter
 /usr/ports/devel/rubygem-rubocop
+/usr/ports/devel/rubygem-rubocop-ast NOT IN INDEX-12
 /usr/ports/devel/rubygem-ruby-atmos-pure
 /usr/ports/devel/rubygem-ruby-bugzilla
 /usr/ports/devel/rubygem-ruby-enum
@@ -14078,7 +14080,6 @@
 /usr/ports/korean/hanyangfonts
 /usr/ports/korean/hcode
 /usr/ports/korean/hmconv
-/usr/ports/korean/hpscat PORT MARKED AS DELETED
 /usr/ports/korean/hunspell
 /usr/ports/korean/ibus-hangul
 /usr/ports/korean/imhangul-gtk2
@@ -14439,6 +14440,7 @@
 /usr/ports/lang/spidermonkey24
 /usr/ports/lang/spidermonkey52
 /usr/ports/lang/spidermonkey60
+/usr/ports/lang/spidermonkey68 NOT IN SUBVERSION
 /usr/ports/lang/spidermonkey78
 /usr/ports/lang/spl
 /usr/ports/lang/squeak
@@ -14958,6 +14960,7 @@
 /usr/ports/mail/py-dkimpy
 /usr/ports/mail/py-email-validator
 /usr/ports/mail/py-email_reply_parser
+/usr/ports/mail/py-flanker NOT IN INDEX-12
 /usr/ports/mail/py-flask-mail
 /usr/ports/mail/py-flufl.bounce
 /usr/ports/mail/py-fuglu
[dan@pkg01:~/ports/head] $ 

Totals:

  1. NOT A PORT – base looks a category to FreshPorts so that is included
  2. PORT MARKED AS DELETED – FreshPorts thinks this port is deleted, but it is not
  3. NEWLY CREATED – this port was created today. INDEX-12 predates that
  4. NOT IN INDEX-12 – no idea why this is not included
  5. NOT IN SUBVERSION – this port is not listed in subversion.

Conclusion

FreshPorts has some errors, which I will look into.

The actual number of ports is wrong on both sites and the correct values is in the 28,800 range.

Nov 142020
 

Today I’m working on https://devgit.freshports.org and fixing links to http://github.com/freebsd/freebsd-ports/ for individual commits.

Some of these links are stunted to: commit/c2b0677 (i.e. no hostname)

A theory

I think I know why. The repo_id field of the commit_log table is empty. This field links to the commit to a specific row in the repo table. That repo contains the hostname.

This is the repo table:

freshports.devgit=# select * from repo;
 id | name  |      description       |   repo_hostname    |      path_to_repo      | repository 
----+-------+------------------------+--------------------+------------------------+------------
  1 | ports | The FreeBSD Ports tree | svnweb.freebsd.org | /ports                 | subversion
  2 | doc   | The FreeBSD doc tree   | svnweb.freebsd.org | /doc                   | subversion
  3 | src   | The FreeBSD src tree   | svnweb.freebsd.org | /base                  | subversion
  8 | doc   | The FreeBSD doc tree   | github.com         | /freebsd/doc           | git
  9 | src   | The FreeBSD src tree   | github.com         | /freebsd/freebsd       | git
  6 | ports | The FreeBSD Ports tree | github.com         | /freebsd/freebsd-ports | git
(6 rows)

The commit_log.repo_id field is set when the commit is processed. I’m going to look at the logs for commits against https://devgit.freshports.org/www/screego/

The example

Let’s start with c2b0677.

From the logs, I found:

$this->{repo}       = 'ports-quarterly'
$this->{repository} = 'git'
'ports-quarterly' and repository = 'git')sql is insert into commit_log (id, message_id, message_date, message_subject, date_added, commit_date, 
                  committer, description, system_id, svn_revision, repo_id, encoding_losses, commit_hash_short) values ( 
                                ?,
                                ?,
                                ?,
                                ?,
                                now(),
                                ?,
                                ?,
                                ?,
                                ?,
                                ?,
                                (SELECT id FROM repo WHERE name = ? and repository = ?),
                                ?::boolean,
                                ?)

Look at lines 1 & 2, and then line 15.

That subquery will find nothing in the repo table with name = ‘ports-quarterly’ (see contents of the repo table listed above).

ports-quarterly is the name of a directory on disk:

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos] $ ls -l
total 22
drwxr-xr-x  69 freshports  freshports  84 Jun 23 22:26 PORTS-2020Q2
drwxr-xr-x   2 freshports  freshports   2 Jul 19 21:52 PORTS-2020Q2-git
drwxr-xr-x  69 freshports  freshports  84 Jul  2 10:30 PORTS-2020Q3
drwxr-xr-x  69 freshports  freshports  84 Jul 10 19:21 PORTS-head
drwxr-xr-x  25 freshports  freshports  43 Jul 14 10:51 freebsd
drwxr-xr-x  23 freshports  freshports  26 Jul 14 10:44 freebsd-doc
drwxr-xr-x  69 freshports  freshports  84 Nov 12 19:18 freebsd-ports
drwxr-xr-x  69 freshports  freshports  84 Nov 12 18:18 freebsd-ports-quarterly

Here, you can see how the given directories relate back to the FreeBSD repo.

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports] $ git config --get remote.origin.url
https://github.com/freebsd/freebsd-ports.git

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports] $ cd ../freebsd-ports-quarterly

[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports-quarterly] $ git config --get remote.origin.url
https://github.com/freebsd/freebsd-ports.git
[dan@devgit-ingress01:/var/db/freshports/ports-jail/var/db/repos/freebsd-ports-quarterly] $ 

How to fix it

I am sure that a mapping from freebsd-ports-quarterly to ports already exists in the code. I just have to find it and use it here.

Edit: I was wrong, it did not exist.

The key discovery so far:

  1. commits to ports on head have the correct link.
  2. commits to ports on the quarterly branch are not getting the repo correct.

This is promising and gives me the direction to solve it.

Just what is wrong?

Going back to the SQL:

SELECT id FROM repo WHERE name = ? and repository = ?

Those values come from: $this->{repo}, $this->{repository}

Substituting the values we have:

SELECT id FROM repo WHERE name = 'ports-quarterly' and repository = 'git'

But we want:

SELECT id FROM repo WHERE name = 'ports' and repository = 'git'

Running that query we get:

freshports.devgit=# SELECT * FROM repo WHERE name = 'ports' and repository = 'git';
 id | name  |      description       | repo_hostname |      path_to_repo      | repository 
----+-------+------------------------+---------------+------------------------+------------
  6 | ports | The FreeBSD Ports tree | github.com    | /freebsd/freebsd-ports | git
(1 row)

freshports.devgit=# 

Tracking it down in the code

Let’s look at how $this->{repo} is set.

From xml_munge_git.pm:

        $commit_log->{repo}             = $Updates{repository};

OK, that’s the ideal location to massage this data.

$Updates{repository} is the incoming value from the XML file for this commit. The data is question is:

<COMMIT Hash="c2b0677c4db6a956a8ad5ed6dfa8f066d9ea72e1" HashShort="c2b0677" Subject="Add www/screego" EncodingLoses="false" Repository="ports-quarterly"/>

That value (‘ports-quarterly‘)is defined as a constant:

$FreshPorts::Constants::Repo_Label_Ports_Quarterly = ‘ports-quarterly’;

I had to create the new relationship. I also decided to rename some constants to better reflect their name. The goal is to reduce shorten comprehension time when I next have to understand this.

%FreshPorts::Constants::RepoLabelsToGitRepoNames =
   $FreshPorts::Constants::Repo_XML_Label_Doc             => $FreshPorts::Constants::Repo_Doc,
   $FreshPorts::Constants::Repo_XML_Label_Ports           => $FreshPorts::Constants::Repo_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly => $FreshPorts::Constants::Repo_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Src             => $FreshPorts::Constants::Repo_Src,
 );

Along the way, as you can see, I renamed a number of constants to make more sense when reading them.

The solution

Here is what I have now:


# FreshPorts database repo names
# These are the names of the FreeBSD repos found within the FreshPorts database
# These the valid values in the repo.name field
#
$FreshPorts::Constants::Repo_DB_Doc                       = 'doc';
$FreshPorts::Constants::Repo_DB_Ports                     = 'ports';
$FreshPorts::Constants::Repo_DB_Src                       = 'src';


#
# These are the values to be used in the Repository field of the incoming XML files
# They reflect the different working copies of repos we are processing.
# It also helps us know what we are working on.
#
$FreshPorts::Constants::Repo_XML_Label_Doc                = 'doc';
$FreshPorts::Constants::Repo_XML_Label_Ports              = 'ports';
$FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly    = 'ports-quarterly';
$FreshPorts::Constants::Repo_XML_Label_Src                = 'src';

#
# These names relate to the directory in which we find that repo on disk.
# They were taken from the repository names found at https://github.com/freebsd/
# in July 2020. They do not need to be kept up to date. They just have to reflect
# the directories used on disk.
# Interesting fact: we don't need this. We do not need to access the repo for
# doc and src commits. We have doc and src listed to be complete.
# $ ls ~freshports/ports-jail/var/db/repos/
# PORTS-2020Q2            PORTS-2020Q3            freebsd                 freebsd-ports
# PORTS-2020Q2-git        PORTS-head              freebsd-doc             freebsd-ports-quarterly
#
$FreshPorts::Constants::Repo_Dir_Name_Doc                 = 'freebsd-doc';
$FreshPorts::Constants::Repo_Dir_Name_Ports               = 'freebsd-ports';
$FreshPorts::Constants::Repo_Dir_Name_Ports_Quarterly     = 'freebsd-ports-quarterly';
$FreshPorts::Constants::Repo_Dir_Name_Src                 = 'freebsd';
#
# How to translate the label (doc) to the repo directory (freebsd-doc)
# Well, we don't have to do this often, or at all, because we only access
# the repo for port commits, nothing else.
# This is how we relate an incoming XML file to a particular working copy of the repo.
#
%FreshPorts::Constants::GitRepos = (
   $FreshPorts::Constants::Repo_XML_Label_Doc             => $FreshPorts::Constants::Repo_Dir_Name_Doc,
   $FreshPorts::Constants::Repo_XML_Label_Ports           => $FreshPorts::Constants::Repo_Dir_Name_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly => $FreshPorts::Constants::Repo_Dir_Name_Ports_Quarterly,
   $FreshPorts::Constants::Repo_XML_Label_Src             => $FreshPorts::Constants::Repo_Dir_Name_Src,
);

#
# With the GitRepos, the repo name on disk and the label we assign for XML both do not match the
# repo we want to use.
#
# freebsd-ports and freebsd-ports-quarterly both map to the ports tree.
#
# That relationship (XML label to name in the FreshPorts repo table) is mapped here.
# The repo table knows only: doc ports src
#
# On the left, we have the incoming values in the XML file.
# On the right, we have the name we use at https://github.com/freebsd/X
#
%FreshPorts::Constants::RepoLabelsToGitRepoNames = (
   $FreshPorts::Constants::Repo_XML_Label_Doc             => $FreshPorts::Constants::Repo_DB_Doc,
   $FreshPorts::Constants::Repo_XML_Label_Ports           => $FreshPorts::Constants::Repo_DB_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Ports_Quarterly => $FreshPorts::Constants::Repo_DB_Ports,
   $FreshPorts::Constants::Repo_XML_Label_Src             => $FreshPorts::Constants::Repo_DB_Src,
 );

Back to the original code, this is the change I need to make.

[dan@devgit-ingress01:~/modules] $ svn di xml_munge_git.pm
Index: xml_munge_git.pm
===================================================================
--- xml_munge_git.pm	(revision 5484)
+++ xml_munge_git.pm	(working copy)
@@ -550,6 +550,26 @@
 	return $myRepoPrefix;
 }
 
+sub ConvertRepoLabelToGitRepoName($) {
+#
+# Given the repo name label from XML, obtain the FreeBSD repo name.
+#
+
+	my $Repo_XML_Label = shift;
+	
+	if (!defined($Repo_XML_Label)) {
+		die('no value set for incoming Repo_XML_Label');
+	}
+	
+	my $myRepoPrefixGitRepoName = $FreshPorts::Constants::RepoLabelsToGitRepoNames{$Repo_XML_Label};
+
+	if (!defined($myRepoPrefixGitRepoName)) {
+		die("'$Repo_XML_Label' was not found in \$FreshPorts::Constants::RepoLabelsToGitRepoNames\n");
+	}
+	
+	return $myRepoPrefixGitRepoName;
+}
+
 sub handle_file_end {
 	# for svn we have:
 	#      <FILE Action="Modify" Revision="512343" Path="head/net/tightvnc/Makefile"></FILE>
@@ -927,7 +947,7 @@
 	$commit_log->{description}	= $description;
 	$commit_log->{system_id}	= $SystemID;
 	$commit_log->{commit_hash_short}= $Updates{commit_hash_short};
-	$commit_log->{repo}	 	= $Updates{repository};
+	$commit_log->{repo}	 	= ConvertRepoLabelToGitRepoName($Updates{repository});
 	$commit_log->{revision} 	= $revision;
 
 	#
[dan@devgit-ingress01:~/modules] $ 

What commits does this affect?

This affects all commits on branches. How many are this year, and therefore will be git-related?

freshports.devgit=# select count(*) from commit_log where repo_id is null and commit_date > '2020-01-01';
 count 
-------
  9103

OK. I’m not going to manually rerun all those commits. Let’s see how many are port-related:

freshports.devgit=# select count(*) from commit_log where repo_id is null and commit_date > '2020-01-01' and not exists (select commit_log_id from commit_log_ports where commit_log_id = commit_log.id);
 count 
-------
    81

Hmm, I wonder what those 81 are related to.

select CL.id, CL.message_id, element_pathname(CLE.element_id)
from commit_log CL
join commit_log_elements CLE on CL.id = CLE.commit_log_id
 where repo_id is null and commit_date > '2020-01-01' and not exists (select commit_log_id from commit_log_ports where commit_log_id = CL.id)
;
   id   |                message_id                |                 element_pathname                  
--------+------------------------------------------+---------------------------------------------------
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/plist_sub_sed_sort.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/ports_env.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/qa.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/rust-compat11-canary.sh
 823381 | 0740d2a6d18f57f33f509fe1ce175fd58d984646 | /ports/head/Mk/Scripts/smart_makepatch.sh
 823431 | 3bc66b9e83aeaceb8a9b6d630a5d6b449e21944b | /ports/head/java/Makefile
 823431 | 3bc66b9e83aeaceb8a9b6d630a5d6b449e21944b | /ports/head/sysutils/Makefile
 823537 | e75cc8b0d915905850369f891c0aaf00ce8a602f | /ports/head/MOVED
 823929 | 23d0237edb81d13cdf50facb6039ee41814489a9 | /ports/head/Mk/Uses/python.mk
 824028 | 6015a195a90f83435221950ae46598b354d049e3 | /ports/head/CHANGES
 824094 | da4be7ff1b25955df5d2ebb7b960c5decb874f06 | /ports/head/MOVED
...

I’m not showing all the output. This is enough to demonstrate that not all port commits affect a port. I sometimes forget that.

Yes, I can set all those commits up as ports tree commits.

I went back and checked the pre-2020 commits which had a null repo_id column. They dated back to 2002 (/CVSROOT/module) and some were as recent as 2012-09-09 (/projects/tinderbox/tinderbox.pl).

Let’s fix those up now:

freshports.devgit=# begin;
BEGIN
freshports.devgit=# update commit_log set repo_id = 6 where repo_id is null and commit_date > '2020-01-01' ;
UPDATE 9103
freshports.devgit=# select CL.id, CL.message_id, element_pathname(CLE.element_id)
from commit_log CL
join commit_log_elements CLE on CL.id = CLE.commit_log_id
 where repo_id is null and commit_date > '2020-01-01' and not exists (select commit_log_id from commit_log_ports where commit_log_id = CL.id)
;
 id | message_id | element_pathname 
----+------------+------------------
(0 rows)

freshports.devgit=# commit;
COMMIT
freshports.devgit=# 

All set. Let’s go. Commit time.

Commit at https://github.com/FreshPorts/freshports/commit/601831a2bbe87bf8467ac667bea71d75b68babd7

Sep 052020
 

I have no write up of the jails used by FreshPorts. The following originated in an email I sent this morning.

FreshPorts runs on a FreeBSD host, hosting four jails.

  1. db – the PostgreSQL database server – the source for all content. connections via TCP/IP
  2. mxPostfix takes incoming emails from subversion mailing list via FreeBSD MX – passes them to ingress jails (dev, test, stage, prod – those different environments are not described here)
  3. ingress – Mostly perl, python, shell. Receives email from mx, converts contents into XML, loads XML into the database. All backend reports, notifications, etc are done here. Some data is generated here for the webserver. See note below. Talks to db.
  4. webnginx, php, shell, python. runs the website – talks to db – constructs HTML, caches most of it on local disk. Uses cache when it exists – runs the FreshPorts fp-listen daemon which connects to db and waits for cache-clearing notices.

NOTE: the ingress and web jails share data: ingress creates some HTML for web and supplies it via a nullfs mounted filesystem.

I used jails because:

  • I could
  • kept logical things together
  • forced better design
  • I like jails
  • meant I could put them different servers if I wanted to
Aug 262020
 

The FreshPorts server died on Friday night. I drove up on Sunday morning to retrieve it. I have not yet investigated the breakage. It might be the M/B.

It was suggested on IRC that replace instead of repair might be a better approach. The existing X8DTU m/b has some unresolved IPMI vulnerability issues. A newer single CPU would perform better than the pair of E5620 CPUs.

I’ve also considered going virtual, if someone would donate that.

Supermicro chassis

Supermicro Model 815-5

Supermicro Model 815-5

Supermicro Model 815-5

The dead system had 196GB of RAM, but I think 128GB would be fine. Keeping the database and the main cache files all in RAM (via ZFS ARC) helps throughput.

If you want to see graphs of the dead system, this recent tweet has them.

Aug 252020
 

This is more for my own sanity for the next time I deploy a new FreshPorts host. There is no useful information here for anyone else.

Base this upon the existing host:

svn cp host_vars/foo.example.org host_vars/NEWHOST
svn cp roles/postgresql-server/templates/hosts/foo.example.org roles/postgresql-server/templates/hosts/NEWHOST

Edit these files:

  • host_vars/NEWHOST
  • roles/postgresql-server/templates/hosts/NEWHOST/pg_hba.conf.j2
  • roles/postgresql-server/templates/hosts/NEWHOST/postgresql.conf.j2

Create ssl certs for:

  • NEWHOST.freshports.org
  • NEWHOST.freshsource.org
  • NEWHOST