Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Jan 072021
 

This documents the git specific code which I need to package and deploy.

/usr/local/libexec/freshports

This is installed by freshports-scripts-git, and I have the directory symlinked to ~/scripts on devgit.freshports.org for convenience.

In this directory, we have the following symlinks to other directories:

[dan@devgit-ingress01:~/scripts] $ find . -type l | xargs ls -l
lrwxr-xr-x  1 dan   dan    14 Dec  6 23:21 ./.#freebsd-cvs.sh -> dan@here.92757
lrwxr-xr-x  1 root  dan    64 Jul  4  2020 ./check_git.sh -> /usr/home/dan/src/git_proc_commit/git-to-freshports/check_git.sh
lrwxr-xr-x  1 root  wheel  35 Nov 26 04:20 ./config.sh -> /usr/local/etc/freshports/config.sh
lrwxr-xr-x  1 dan   dan    64 Jul  3  2020 ./git-delta.sh -> /usr/home/dan/src/git_proc_commit/git-to-freshports/git-delta.sh
lrwxr-xr-x  1 dan   dan    75 Aug  6 21:16 ./git-range-of-commits.sh -> /usr/home/dan/src/git_proc_commit/git-to-freshports/git-range-of-commits.sh
lrwxr-xr-x  1 root  dan    72 Jul  5  2020 ./git-single-commit.sh -> /usr/home/dan/src/git_proc_commit/git-to-freshports/git-single-commit.sh
lrwxr-xr-x  1 root  dan    76 Jul  5  2020 ./git-to-freshports-xml.py -> /usr/home/dan/src/git_proc_commit/git-to-freshports/git-to-freshports-xml.py
[dan@devgit-ingress01:~/scripts] $ 

Ignoring lines 2 and 4 which are not relevant, we have this list of files, all of which reside in /usr/home/dan/src/git_proc_commit/git-to-freshports/:

  1. check_git.sh
  2. git-delta.sh
  3. git-range-of-commits.sh
  4. git-single-commit.sh
  5. git-to-freshports-xml.py

What repo is this?

[dan@devgit-ingress01:~/src/git_proc_commit/git-to-freshports] $ git remote -v
origin	git@github.com:FreshPorts/git_proc_commit.git (fetch)
origin	git@git.langille.org:FreshPorts/git_proc_commit.git (push)
origin	git@github.com:FreshPorts/git_proc_commit.git (push)

OK, we do that.

/usr/local/lib/perl5/site_perl/FreshPorts

This directory is populated by p5-freshports-modules-git, and I have the directory symlinked to ~/modules on devgit.freshports.org for convenience.

[dan@devgit-ingress01:/usr/local/lib/perl5/site_perl/FreshPorts] $ find . -type l | xargs ls -l 
lrwxr-xr-x  1 root  freshports  35 Jul 18 17:12 ./config.pm -> /usr/local/etc/freshports/config.pm

We have nothing installed here.

Which makes sense, it’s all on the ingress side – collecting of commits and creating XML.

This makes it easier. Just one package, and I’ll just install it into that single directory.

EDIT: 2021-01-08 Package has been created and this is what it installs:

[dan@empty:/usr/local/libexec/freshports] $ ls -l
total 57
-r-xr-xr-x  1 root  wheel   505 Jan  9 02:03 check_git.sh
-r-xr-xr-x  1 root  wheel  2947 Jan  9 02:03 git-delta.sh
-r-xr-xr-x  1 root  wheel  1663 Jan  9 02:03 git-range-of-commits.sh
-r-xr-xr-x  1 root  wheel  1589 Jan  9 02:03 git-single-commit.sh
-r-xr-xr-x  1 root  wheel  9359 Jan  9 02:03 git-to-freshports-xml.py
[dan@empty:/usr/local/libexec/freshports] $ pkg info -l freshports-git-proc-commit
freshports-git-proc-commit-0.0.1:
	/usr/local/libexec/freshports/check_git.sh
	/usr/local/libexec/freshports/git-delta.sh
	/usr/local/libexec/freshports/git-range-of-commits.sh
	/usr/local/libexec/freshports/git-single-commit.sh
	/usr/local/libexec/freshports/git-to-freshports-xml.py
[dan@empty:/usr/local/libexec/freshports] $ 
[dan@empty:~] $ pkg info -d freshports-git-proc-commit
freshports-git-proc-commit-0.0.1_1:
	python37-3.7.9_1
	py37-pygit2-1.3.0
	py37-lxml-4.6.2
	git-2.30.0
[dan@empty:~] $ 
Jan 012021
 

This post documents the database upgrade process.

Issues encountered during initial attempts

There are permission issues on the database. Not everything is owned by the postgres super-user. In fact, many of them are owned by dan – this database is 20+ years old.

To easily resolve the ownership issue, I take the database dump (named freshports.org.dump) from production (the hostname is supernews) and load it into a development database using the –no-owner option on the pg_restore command. More on that later. The –no-owner option ensures that the restore process does not “set ownership of objects to match the original database“.

The permission issue also extends to functions. If I look at the production database, I see this:

freshports.org=# \df+ armor
                                                                          List of functions
 Schema | Name  | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description 
--------+-------+------------------+---------------------+------+------------+----------+-------+----------+-------------------+----------+-------------+-------------
 public | armor | text             | bytea               | func | immutable  | unsafe   | dan   | invoker  |                   | c        | pg_armor    | 
(1 row)

freshports.org=# 

That permission issue is also fixed by the –no-owner option. After importing in the dev database, that function looks like this:

freshports.prod.test.owners=# \df+ armor
                                                                            List of functions
 Schema | Name  | Result data type | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language | Source code | Description 
--------+-------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+-------------
 public | armor | text             | bytea               | func | immutable  | unsafe   | postgres | invoker  |                   | c        | pg_armor    | 
(1 row)

freshports.prod.test.owners=# 

The current procedure

This section outlines the steps to convert the existing production database and copy it over to Amazon’s RDS.

Dump production

On the production server:

pg_dump -Fc freshports.org -f freshports.org.dump

Load into dev

Load that into the development server:

sudo su -l postgres
createdb -O postgres -T template0 -E SQL_ASCII freshports.prod
pg_restore -j 20 --no-owner -d freshports.prod freshports.org.dump 

Doing this as the postgres use ensures optimal permissions.

Estimated time: 25 minutes.

Adjust the database schema

Once we had the database loaded into the dev environment, I ran this script to update the schema to be git-ready.

This is taken from the SQL for git issue on the git_proc_commit project.

-- repo table needs to know which repository tool we are using

ALTER TABLE public.repo
    ADD COLUMN repository text NOT NULL DEFAULT 'subversion';

COMMENT ON COLUMN public.repo.repository
    IS 'subversion? git? cvs?';


-- Let's rename the svn_hostname column:
ALTER TABLE public.repo
    RENAME svn_hostname TO repo_hostname;


-- We don't need these unique indexes. We might do something with constraints later.

drop index repo_description;
drop index repo_name;
drop index repo_path_to_repo;

-- Add in the git data:

insert into repo (name, description, repo_hostname, path_to_repo, repository) values ('ports', 'The FreeBSD Ports tree', 'github.com', '/freebsd/freebsd-ports', 'git');
insert into repo (name, description, repo_hostname, path_to_repo, repository) values ('doc', 'The FreeBSD doc tree', 'github.com', '/freebsd/doc', 'git');
insert into repo (name, description, repo_hostname, path_to_repo, repository) values ('src', 'The FreeBSD src tree', 'github.com', '/freebsd/freebsd', 'git');


-- Also, the short hash.
ALTER TABLE public.commit_log
    ADD COLUMN commit_hash_short text;

COMMENT ON COLUMN public.commit_log. commit_hash_short
    IS 'This is the short version of the git hash stored in
 svn_revision.

If null/empty, it is not a git hash.';

-- we need an index on the short stuff

-- Index: commit_log_commit_hash_short

-- DROP INDEX public.commit_log_commit_hash_short;

CREATE INDEX commit_log_commit_hash_short
    ON public.commit_log USING btree
    (commit_hash_short COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

Permissions

This fixes up a lingering permissions issue – the permissions in question were too generous.

Verify this:

select grantee, privilege_type from information_schema.role_table_grants where table_name='pg_am';
select grantee, privilege_type from information_schema.role_table_grants where table_name='pg_authid';

If rsyncer is listed, then this:

revoke all PRIVILEGES ON all tables in schema pg_catalog from rsyncer;

Extensions made location

These steps are mentioned because it came up in response to these errors:

pg_restore: from TOC entry 351; 1255 81083 FUNCTION armor(bytea) dan
pg_restore: error: could not execute query: ERROR: permission denied for language c

Why is that? This is an old database, and the permissions issues I mentioned above.

This was the recommended fix. The –no-owner option is insufficient to fix this problem.

begin;
create extension pgcrypto      from unpackaged;
create extension fuzzystrmatch from unpackaged;
create extension plperl        from unpackaged;

If all good, also do a commit.

datatypes, relational integrity, and functions

With the table changes, we have updates to datatypes, triggers, and functions (stored procedures). These steps update those:

psql freshports.prod
begin;
\i datatype.txt
\i ri.txt
\i sp.txt
commit;

NOTICES during the datatype.txt command are expected. There should be no errors.

That is the last step.

Dump the fixed schema

Now we dump this new schema and all the data:

pg_dump -Fc freshports.prod > freshports.prod.new-schema.dump

Estimated time: 8 minutes

Copy to AWS

The dump was copied to AWS and loaded into the RDS instance.

scp freshports.prod.new-schema.dump ec2-user@aws:

Load fixed schema

On AWS, create and load the database:

createdb -h pg01.foo.us-east-1.rds.amazonaws.com -U postgres -E SQL_ASCII -T template0 --locale=C freshports.fixed
time pg_restore -h pg01.cqor9jd5vvww.us-east-1.rds.amazonaws.com -U postgres -d freshports.fixed freshports.prod.new-schema.dump

Estimate time: 35 minutes

These errors on loading is expected and does not affect the outcome.

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 465; 1255 847666 FUNCTION plpgsql_call_handler() postgres
pg_restore: error: could not execute query: ERROR:  permission denied for language c
Command was: CREATE FUNCTION public.plpgsql_call_handler() RETURNS language_handler
    LANGUAGE c
    AS '$libdir/plpgsql', 'plpgsql_call_handler';


pg_restore: error: could not execute query: ERROR:  function public.plpgsql_call_handler() does not exist
Command was: ALTER FUNCTION public.plpgsql_call_handler() OWNER TO postgres;

pg_restore: warning: errors ignored on restore: 2

Using the new database

Now that the new database is over on AWS, you can modifying the configuration files to start using it.

The list of configuration files which need to be updated are covered by Moving devgit.freshports.org from GitHub to git.FreeBSD.org

Dec 212020
 

Just like I moved devgit.freshports.org from github to git.freebsd.org for the doc repo on Thursday, today (Monday), I’m doing the same thing for the src repo.

The jail uses storage on an nvd-based zpool. First, create a new filesystem:

[dan@slocum:~] $ sudo zfs create nvd/freshports/devgit/ingress/repos/src

You can see it appear here in the jail, but with the wrong permissions:

[dan@devgit-ingress01:/var/db/ingress/repos] $ ls -l
total 22
drwxr-xr-x  23 ingress  ingress  27 Dec 17 23:55 doc
drwxr-xr-x  26 ingress  ingress  45 Dec 20 03:03 freebsd
drwxr-xr-x  23 ingress  ingress  26 Dec  8 07:12 freebsd-doc
drwxr-xr-x  69 ingress  ingress  84 Dec 19 09:18 freebsd-ports
drwxr-xr-x  69 ingress  ingress  84 Nov 22 19:28 freebsd-ports-quarterly
-rw-r--r--   1 ingress  ingress  41 Dec 21 18:54 latest.doc
-rw-r--r--   1 ingress  ingress  41 Dec 21 18:54 latest.freebsd
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:12 latest.freebsd-doc
-rw-r--r--   1 ingress  ingress  41 Dec 21 18:54 latest.freebsd-ports
-rw-r--r--   1 ingress  ingress  41 Dec 21 18:54 latest.freebsd-ports-quarterly
drwxr-xr-x   2 root     wheel     2 Dec 21 18:55 src
[dan@devgit-ingress01:/var/db/ingress/repos] $ 

So I fix that:

[dan@devgit-ingress01:/var/db/ingress/repos] $ sudo chown ingress:ingress src

Then clone the repo:

[dan@devgit-ingress01:/var/db/ingress/repos] $ sudo su -l ingress
$ bash
[ingress@devgit-ingress01 ~]$ cd repos
[ingress@devgit-ingress01 ~/repos]$ git clone https://git.FreeBSD.org/src.git
Cloning into 'src'...
remote: Enumerating objects: 3798472, done.
remote: Counting objects: 100% (3798472/3798472), done.
remote: Compressing objects: 100% (744779/744779), done.
remote: Total 3798472 (delta 3016406), reused 3757055 (delta 2987426), pack-reused 0
Receiving objects: 100% (3798472/3798472), 1.10 GiB | 11.63 MiB/s, done.
Resolving deltas: 100% (3016406/3016406), done.
Updating files: 100% (81314/81314), done.
[ingress@devgit-ingress01 ~/repos]$ ls -l
total 24
drwxr-xr-x  23 ingress  ingress  27 Dec 17 23:55 doc
drwxr-xr-x  26 ingress  ingress  45 Dec 20 03:03 freebsd
drwxr-xr-x  23 ingress  ingress  26 Dec  8 07:12 freebsd-doc
drwxr-xr-x  69 ingress  ingress  84 Dec 19 09:18 freebsd-ports
drwxr-xr-x  69 ingress  ingress  84 Nov 22 19:28 freebsd-ports-quarterly
-rw-r--r--   1 ingress  ingress  41 Dec 21 19:03 latest.doc
-rw-r--r--   1 ingress  ingress  41 Dec 21 19:03 latest.freebsd
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:12 latest.freebsd-doc
-rw-r--r--   1 ingress  ingress  41 Dec 21 19:03 latest.freebsd-ports
-rw-r--r--   1 ingress  ingress  41 Dec 21 19:03 latest.freebsd-ports-quarterly
drwxr-xr-x  26 ingress  ingress  44 Dec 21 19:00 src
[ingress@devgit-ingress01 ~/repos]$ 

It takes up about 2.5G of space:

[ingress@devgit-ingress01 ~/repos]$ du -ch -d 0 src
2.4G	src
2.4G	total

The next step: set the commit marker.

That would be commit 3cc0c0d66a065554459bd2f9b4f80cc07426464a.

Creating the commit marker:

[ingress@devgit-ingress01 ~/repos]$ echo 3cc0c0d66a065554459bd2f9b4f80cc07426464a > latest.src
[ingress@devgit-ingress01 ~/repos]$ cat latest.src
3cc0c0d66a065554459bd2f9b4f80cc07426464a
[ingress@devgit-ingress01 ~/repos]$ 

A plan for the future involves setting this via a tag in the repo.

The configuration file change is within /usr/local/etc/freshports/config.pm:

convert_repo_label_to_directory(){
  repo=$1
  
  case $repo in
     src) dir='src';;
     src-stable-12) dir='freebsd-stable-12';;
     doc) dir='doc';;
     ports) dir='freebsd-ports';;
     ports-quarterly) dir='freebsd-ports-quarterly';;
     *) dir='';;
  esac

  echo $dir
}

The fix is on line 5. It was: freebsd, it is now src.

Those values correspond to the GitHub repo name (https://github.com/freebsd/freebsd/) and the FreeeBSD repo name (https://cgit.freebsd.org/src/ respectivtely.

For the truly bored amongst you, here are the logs for the first processing of the src repo:

2020.12.21 19:18:12 git-delta.sh Now processing repo: src
2020.12.21 19:18:12 git-delta.sh REPODIR='/var/db/ingress/repos/src' exists
2020.12.21 19:18:12 git-delta.sh LATEST_FILE='/var/db/ingress/repos/latest.src' exists
2020.12.21 19:18:12 git-delta.sh Repodir is /var/db/ingress/repos/src
2020.12.21 19:18:12 git-delta.sh Running: /usr/local/bin/git pull:
warning: Pulling without specifying how to reconcile divergent branches is
discouraged. You can squelch this message by running one of the following
commands sometime before your next pull:

  git config pull.rebase false  # merge (the default strategy)
  git config pull.rebase true   # rebase
  git config pull.ff only       # fast-forward only

You can replace "git config" with "git config --global" to set a default
preference for all repositories. You can also pass --rebase, --no-rebase,
or --ff-only on the command line to override the configured default per
invocation.

Already up to date.
2020.12.21 19:18:12 git-delta.sh Done.
2020.12.21 19:18:12 git-delta.sh STARTPOINT = 3cc0c0d66a065554459bd2f9b4f80cc07426464a
2020.12.21 19:18:12 git-delta.sh Running: /usr/local/bin/git rev-list 3cc0c0d66a065554459bd2f9b4f80cc07426464a..HEAD
2020.12.21 19:18:12 git-delta.sh Done.
2020.12.21 19:18:12 git-delta.sh The commits found are:
2020.12.21 19:18:12 git-delta.sh /usr/local/libexec/freshports/git-to-freshports-xml.py --repo src --path /var/db/ingress/repos/src --commit 3cc0c0d66a065554459bd2f9b4f80cc07426464a --spooling /var/db/ingress/message-queues/spooling --output /var/db/ingress/message-queues/incoming
2020.12.21 19:18:12 git-delta.sh Ending

I did run that git config pull.rebase false later.

Dec 192020
 

My plan is to update the production website and database in place. To test this process, a copy of the production database has been copied to the pg02.int.unixathome.org PostgreSQL database server and is now available as freshports.dgnew.

The first test was the database update process. I think I have that settled now.

Pointing the devgit.freshports.org website at the database found a few missing pieces in the stored procedures.

Next, I want to try the ingress side of the website; the processing of commits.

What is involved in that process?

Configuration file changes and, to ensure integrity, stopping all processing first.

Stopping the processing

[dan@devgit-ingress01:~] $ sudo service ingress stop
Stopping ingress.
[dan@devgit-ingress01:~] $ sudo service ingress_svn stop
Stopping ingress_svn.
[dan@devgit-ingress01:~] $ sudo service freshports stop
Stopping freshports.
[dan@devgit-ingress01:~] $ 

That stops, respectively:

  1. the processing of XML files
  2. the processing of incoming svn commit emails and generation of XML files
  3. the detection of new git commits and creation of XML files

Configuration files

There are two sets of configuration files:

  • ingress – the jail used for ingesting commits into the database
  • website – the jail used for hosting the website

Usually, these run in separate jails, but they don’t have to. At present, production is running on single CPU non-ZFS system with hardware RAID and 8GB RAM. That host does not run jails.

website

The website website files are:

[dan@devgit-nginx01:/usr/local/etc/freshports] $ sudo grep freshports.devgit *
config.pm:$FreshPorts::Config::dbname			= 'freshports.devgit';
config.sh:DB=freshports.devgit
database.php:	$db = pg_connect("dbname=freshports.devgit host=pg02.int.unixathome.org user=www_dev_git password=[redacted] sslmode=require");
fp-listen.ini:DBNAME		= 'freshports.devgit'
[dan@devgit-nginx01:/usr/local/etc/freshports] $ 

Changing databases for the website also necessitates clearing the cache.

Yes, the database is specified in four different locations. We have shell scripts, python, PHP, and Perl all needing the configuration.

After making those changes, this clears the cache:

$ sudo zfs rollback system/data/freshports-cache/devgit-nginx01/ports@empty
$ sudo zfs rollback system/data/freshports-cache/devgit-nginx01/packages@empty

The website is now on the conversion test database (freshports.dgnew, as is dev git new).

ingress

The ingress configuration files are:

[dan@devgit-ingress01:/usr/local/etc/freshports] $ sudo grep freshports.devgit *
config.ini:DBNAME             = 'freshports.devgit'
config.pm:$FreshPorts::Config::dbname			= 'freshports.devgit';

The .ini file is used for processing new package building information.

The .pm file is used by the commit processing.

I will update both, let the commit processing resume, and see what happens.

*some time later*

I had some database permission issue because of missing pg_hba.conf settings, but once that was fixed, the commits started coming in:

Commits coming into the converted database

Commits coming into the converted database

Now I wait and monitor the incoming commits to be sure the ongoing processing is smooth.

Dec 172020
 

The doc repo has moved from svn to git. This changeover occurred on 2020-12-09.

The last svn commit was: 54737

The first git commit was: 3be01a475855e7511ad755b2defd2e0da5d58bbe

To date, devgit.freshports.org has been using https://github.com/freebsd/freebsd-doc/ for processing commits.

Today’s work will convert from that GitHub repo to https://cgit.freebsd.org/doc/ (actually, https://git.freebsd.org/).

What changes are required

The following changes are required:

  1. A new working copy of the git.FreeBSD.org/doc repo
  2. A marker pointing the last commit processed
  3. Configuration file changes to point to that repo

EDIT: 2021-01-01 – NOTE: all the configuration files are also maintained via Ansible – the manual steps are not necessarily required. I think I should next document the changes to Ansible when changing database names / servers.

The rest of the post documents those changes.

A new working copy of the git.FreeBSD.org/doc repo

Let’s clone that repo:

$ cd /var/db/ingress/repos
[ingress@devgit-ingress01 ~]$ cd repos
[ingress@devgit-ingress01 ~/repos]$ ls -l
total 18
drwxr-xr-x  26 ingress  ingress  45 Dec 17 03:03 freebsd
drwxr-xr-x  23 ingress  ingress  26 Dec  8 07:12 freebsd-doc´
drwxr-xr-x  69 ingress  ingress  84 Dec 17 02:18 freebsd-ports
drwxr-xr-x  69 ingress  ingress  84 Nov 22 19:28 freebsd-ports-quarterly
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd-doc
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd-ports
-rw-r--r--   1 ingress  ingress  41 Dec 17 23:51 latest.freebsd-ports-quarterly
[ingress@devgit-ingress01 ~/repos]$ git clone https://git.FreeBSD.org/doc.git
Cloning into 'doc'...
remote: Enumerating objects: 449334, done.
remote: Counting objects: 100% (449334/449334), done.
remote: Compressing objects: 100% (120607/120607), done.
remote: Total 449334 (delta 314272), reused 448476 (delta 313566), pack-reused 0
Receiving objects: 100% (449334/449334), 245.50 MiB | 11.60 MiB/s, done.
Resolving deltas: 100% (314272/314272), done.
Updating files: 100% (11346/11346), done.
[ingress@devgit-ingress01 ~/repos]$ 

Done.

A marker pointing the last commit processed

In the directory listing above, you can see files starting with latest. which mirror the list of repos, which are directories. We need to create a new file, named latest.doc, which contains the hash of the last commit processed by FreshPorts.

To find that value, I did this:

  1. browse to https://cgit.freebsd.org/doc/
  2. click on main
  3. scroll down to “Mark the repository as being converted to Git.”
  4. Take that hash value

You won’t be able to reproduce that once there are enough commits and that particular commit scrolls off the page. You will always be able to find the first commit (3be01a475855e7511ad755b2defd2e0da5d58bbe) and see parent listed as 89d0233560e4ba181d73143fc25248b407120e09

Let’s put that into a file:

[ingress@devgit-ingress01 ~/repos]$ echo 89d0233560e4ba181d73143fc25248b407120e09 > latest.doc

What do we have now:

[ingress@devgit-ingress01 ~/repos]$ ls -l
total 21
drwxr-xr-x  23 ingress  ingress  27 Dec 17 23:55 doc
drwxr-xr-x  26 ingress  ingress  45 Dec 17 03:03 freebsd
drwxr-xr-x  23 ingress  ingress  26 Dec  8 07:12 freebsd-doc
drwxr-xr-x  69 ingress  ingress  84 Dec 17 02:18 freebsd-ports
drwxr-xr-x  69 ingress  ingress  84 Nov 22 19:28 freebsd-ports-quarterly
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.doc
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd-doc
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd-ports
-rw-r--r--   1 ingress  ingress  41 Dec 18 00:06 latest.freebsd-ports-quarterly
[ingress@devgit-ingress01 ~/repos]$ 

Next, configuration file changes.

Configuration file changes to point to that repo

What files need changing? Just one I think.

[dan@devgit-ingress01:~] $ cd /usr/local/etc/freshports
[dan@devgit-ingress01:/usr/local/etc/freshports] $ sudo grep freebsd-doc *
config.sh:     doc) dir='freebsd-doc';;

When I make this change, I want to disable the git processing so as not to change a configuration setting in the middle of processing.

$ sudo sysrc -f /etc/periodic.conf fp_check_for_git_commits_enable="NO"
fp_check_for_git_commits_enable: YES -> NO

Let’s verify nothing is running git processing:

 ps auwwx | grep ingress
ingress     60714  0.0  0.0 10676  2188  -  SCJ  00:14   0:00.00 sleep 3
ingress_svn 60727  0.0  0.0 10676  2188  -  SCJ  00:14   0:00.00 sleep 3
ingress_svn 98912  0.0  0.0 11004  2424  -  IsJ  Sat23   0:00.15 daemon: ingress_svn[98913] (daemon)
ingress_svn 98913  0.0  0.0 11868  2968  -  SJ   Sat23   0:32.78 /bin/sh /usr/local/libexec/freshports-service/ingress_svn.sh
ingress     98921  0.0  0.0 11004  2424  -  IsJ  Sat23   0:00.44 daemon: ingress[98923] (daemon)
ingress     98923  0.0  0.0 11844  2952  -  SJ   Sat23   0:21.70 /bin/sh /usr/local/libexec/freshports-service/ingress.sh
dan         60729  0.0  0.0 11384  2764  2  S+J  00:14   0:00.00 grep ingress

That’s normal. ingress_svn is the daemon checking for incoming svn commits.

ingress is the daemon which looks for incoming git commits, but that’s the daemon which processes the XML files. The periodic.conf setting we adjusted is for the creation of those XML files. That is what we are pausing now.

Next, I updated config.sh. This is what we have now:

[dan@devgit-ingress01:/usr/local/etc/freshports] $ sudo grep doc *
config.pm:# Values as found at https://www.postgresql.org/docs/current/static/libpq-ssl.html
config.pm:$FreshPorts::Config::Repo_DOC             = 'doc';
config.pm:$FreshPorts::Config::DB_Root_Prefix_DOC             = '/doc';
config.sh:# see https://www.postgresql.org/docs/12/libpq-envars.html
config.sh:     doc) dir='doc';;

That last line is the updated valud. The others refer to repo names and internal pathnames within the FreshPorts database.

What’s next?

Turn on git commit processing and watch the logs.

$ sudo sysrc -f /etc/periodic.conf fp_check_for_git_commits_enable="YES"
fp_check_for_git_commits_enable: NO -> YES
$ 

Now I wait until 7:24, because the script runs every 3 minutes.

BOOM! The latest commit is now in: 482d8311b8a1e25a66ee49af4bc7efadd8be22aa

Nov 292020
 

Some blog posts serve to help me think through to a solution. This blog post is just for that.

Today I realized the code needs to handle both git and svn. I thought I would have one cut-over date after which all commits would go through git. I see now that this isn’t the way to go. The code has to be ready to import both git and svn commits. But not from the same tree. We don’t want duplicates.

We have three repos:

  1. doc
  2. ports
  3. src

So what next?

Today, I’m going to take an XML file from dev and see if devgit can import it. I fully expect errors.

Use of uninitialized value $Updates{"commit_hash"} in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 622.
Use of uninitialized value $Updates{"FileRevision"} in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 623.
Use of uninitialized value $FileRevision in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 624.
no value set for incoming RepoName at /usr/local/lib/perl5/site_perl/FreshPorts/xml_munge_git.pm line 555.

And we have them. Yes, there is no commit_hash in the subversion XML.

The ingress code is different, to handle a commit_hash, both long ( 1cabbda44f7f82543402b6a988976020afda2c46) and short (1cabbda).

There is no code in the git branch to handle importing svn commits.

What about keeping the two websites separate?

Let’s consider this scenario.

doc starts using git first. Then src, then ports.

Let git.freshports.org process the git commits. Let www.freshports.org process the svn commits.

When everything is transitioned to git, promote git.freshports.org to www.freshports.org

No, that won’t work. The databases are separate. The git website won’t have all the commits which were processed by the svn website….

It has to be one database

We know the git database can handle svn commits, because they are already present. We know the website can already display svn commits, because it is.

Yes, it’s just the ingress side which has to be updated now. Or rather, svn-specific code merged into the git branch.

This might be feasible if we keep svn and git processing completely separate and don’t try to make one do both.

Avoiding concurrency issues

FreshPorts has always process one commit at a time, in the order they are received. For cvs and svn, ‘order’ was defined by when the commit mailing list email was received. Commits received out of order will produce interesting results, such as a port version decreasing. There is no simple solution to that issue as far as I know.

For git, commits are again processed in order, according to however they appear in the tree, not commit date order.

But with two input streams, I’d rather avoid having two commits being processed at the same time. It is unlikely that any concurrency issues would arise, but I’d rather just avoid that.

That means two separate message queues and processing, but only one consumer of those two queues.

The svn outline

For svn, commits are processed like this:

  1. email arrives
  2. raw email is dumped into ~ingress/message-queues/incoming/2020.11.29.17.43.12.53448.txt
  3. the above is all handled by ~ingress/.mailfilter and these configuration settings in /usr/local/etc/postfix/main.cf
    mailbox_command = /usr/local/bin/maildrop -d ${USER}
    setgid_group = maildrop
    
  4. fp-daemon.sh sees 2020.11.29.17.43.12.53448.txt
  5. XML is created and dumped into 2020.11.29.17.43.12.53448.txt.xml but in ~freshports/message-queues/recent/
  6. XML is processed and loaded into the database.

The git outline

For git processing, there is no incoming email. Instead, we poll the local working copy of the git repo after a git fetch.

  1. The FreeBSD periodic system invokes /usr/local/etc/periodic/everythreeminutes/215.fp_check_git_for_commits
  2. If a new commit is found, it is extracted from the repo and a new file is created: ~ingress/message-queues/incoming/2020.10.01.19.50.02.000000.4796a64ade4267608e861f717e443c0290b73b70.xml – yes, that is a timestamp and a commit hash in that filename.
  3. The freshports daemon notices a new file in the incoming directory.
  4. XML is processed and loaded into the database.

Joining the two outlines

The solution I see is the modify both outlines so they stop at creating the XML file in different directories.

The freshports daemon then scans both directories and processes them accordingly.

The fp-daemon code (or more specifically, the code it invokes) will be modified so it only creates the XML and does not process it.

Nov 272020
 

As a sanity check, there are several diffs to compare devgit.freshports.org with dev.freshports.org and sometimes they detect a false positive.

Case in point, a recent commit to Code_Aster:

  • https://svnweb.freebsd.org/ports?view=revision&revision=556349
  • https://github.com/freebsd/freebsd-ports/commit/d23fb94b8640d1c9d38c3cafc69c89ed4fe11939

In the svnweb link, you will see:

head/science/tfel-edf/
(Copied from head/science/tfel, r555690)

It is that repo copy (i.e. svn copy) from science/tfel to science/tfel-edf which gives rise to a difference in the list of files when comparing the two commits.

  • git lists science/tfel-edfsvn does not.
  • git lists science/tfel-edf/files/patch-cmake_modules_tfel.cmakesvn does not.
  • svn lists science/tfel-edf/distinfogit does not.

The directory inclusion/omission is a direct result of how the two tools handle a copy.

patch-cmake_modules_tfel.cmake was not modified after the copy – that is why svn does not list it.

science/tfel-edf/distinfo is not in the repo, which is why svn lists it as deleted.

When a diff does the two websites do not match, I feel obliged to investigate in case the code needs to be updated. This post serves as a reminder to myself that sometimes missing files are OK.

Nov 232020
 

In the last post, I found that many commits were to the master branch when they should have been on the quarterly branch. Now I think I see why.

See this XML:

<OS Repo="ports-quarterly" Id="FreeBSD" Branch="master"/>

If it’s quarterly, it should name the branch. Case in point: 2020Q4.

I went to https://lists.freebsd.org/pipermail/svn-ports-branches/2020-November/thread.html to look for known quarterly commits.

Hmm, first, let’s find known commits at https://github.com/freebsd/freebsd-ports/tree/branches/2020Q4 – when I looked, the latest commit was https://github.com/freebsd/freebsd-ports/commit/46433baae934d92698422495b72f811839caa1a9

MFH: r555565
security/wolfssl: fix build on big-endian

Merge upstream patch to fix build on big-endian architectures.

Also unmark mips and mips64 as broken, now builds fine.

Approved by:	portmgr (fix build blanket)

The commit just before that is: https://github.com/freebsd/freebsd-ports/commit/e79616836f4e962d370f4364760d85a5e8460a65

How do did I find out? I looked at https://github.com/freebsd/freebsd-ports/commits/branches/2020Q4

When FreshPorts processes a commit, it needs a working copy of the repo as it looked at that commit.

I am trying to figure out how to do that when the commit is on a branch.

To get a copy of the branch, I do:

$ git checkout branches/2020Q4
$ git branch
  branches/2020Q3
* branches/2020Q4
  master

Next, I want the tree as it existed at commit 46433baae934d92698422495b72f811839caa1a9

i.e. https://github.com/freebsd/freebsd-ports/commit/46433baae934d92698422495b72f811839caa1a9

My first attempt is

$ git checkout 46433baae934d92698422495b72f811839caa1a9
Note: switching to '46433baae934d92698422495b72f811839caa1a9'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c 

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 46433baae934 MFH: r555565

That MFH: r555565 message indicates that I am at the right commit.

Now, if I process that commit:

echo  /usr/local/libexec/freshports/git-to-freshports-xml.py --repo ports-quarterly --path \
/usr/home/dan/src/freebsd/freebsd-ports-quarterly  --single-commit \
46433baae934d92698422495b72f811839caa1a9 --spooling /var/db/ingress/message-queues/spooling --output \
/tmp -v | sudo su -fm ingress

The important point from the above: /usr/home/dan/src/freebsd/freebsd-ports-quarterly

This is a checkout of the quarterly ports branch.

In the file I get:

$ cat /tmp/2020.11.17.16.07.01.000000.46433baae934d92698422495b72f811839caa1a9.xml
<?xml version='1.0' encoding='UTF-8'?>
<UPDATES Version="1.4.0.0">
  <UPDATE>
    <DATE Year="2020" Month="11" Day="17"/>
    <TIME Timezone="UTC" Hour="16" Minute="7" Second="1"/>
    <OS Repo="ports-quarterly" Id="FreeBSD" Branch="branches/2020Q4"/>
    <LOG>MFH: r555565

security/wolfssl: fix build on big-endian

Merge upstream patch to fix build on big-endian architectures.

Also unmark mips and mips64 as broken, now builds fine.

Approved by:	portmgr (fix build blanket)</LOG>
    <PEOPLE>
      <UPDATER Handle="pkubaj &lt;pkubaj@FreeBSD.org&gt;"/>
    </PEOPLE>
    <COMMIT Hash="46433baae934d92698422495b72f811839caa1a9" HashShort="46433ba" Subject="MFH: r555565" EncodingLoses="false" Repository="ports-quarterly"/>
    <FILES>
      <FILE Action="Modify" Path="security/wolfssl/Makefile"/>
      <FILE Action="Modify" Path="security/wolfssl/distinfo"/>
    </FILES>
  </UPDATE>
</UPDATES>
$ 

OK, that has the expected branch value for Branch, but those files are wrong. Looking on dev, I find this for the same commit:

      <FILE Action="Modify" Path="branches/2020Q4/security/wolfssl/Makefile" Revision="555566"></FILE>
      <FILE Revision="555566" Path="branches/2020Q4/security/wolfssl/distinfo" Action="Modify"></FILE>
      <FILE Revision="555566" Action="Modify" Path="branches/2020Q4/"></FILE>

Notice they all start with branches/2020Q4.

A prefix is missing. We could handle this on the XML processing side.

Lets’ see.

We have this in the XML: security/wolfssl/Makefile

We have this in debugging: File = [Modify : /ports/2020Q4/security/wolfssl/Makefile : 46433baae934d92698422495b72f811839caa1a9]

That output is produced by this code in xml_munge_git.pm:

# This is where we add in the repo name to the path
my $filename     = $DB_Root_Prefix . '/' . $Updates{branch} . '/' . $FilePath;
my $revisionname = $FileRevision;
my $commit_log_element;
        

print "File = [$FileAction : $filename";

I took about an hour to figure this out. And made code changes:

Index: xml_munge_git.pm
===================================================================
--- xml_munge_git.pm	(revision 5489)
+++ xml_munge_git.pm	(working copy)
@@ -214,8 +214,11 @@
 	$p->register(">UPDATES>UPDATE>OS:Id",                  "attr"  => \$Updates{os});
 
 	#
-	# for git, let's put branch in branch-git
-	# will will populate $Updates{branch} with the converted value. e.g. master -> head
+	# EDIT 2020-11-17 - Updates{branch_git} is the branch name supplied by git.  e.g. master, branches/20202Q4
+	# EDIT 2020-11-17 - removing all references to $Updates{branch} 
+	#
+	# for git, let's put branch in branch_git
+	# will will populate $Updates{} with the converted value. e.g. master -> head
 	# and branches/2020Q3 -> 2020Q3
 	#
 	$p->register(">UPDATES>UPDATE>OS:Branch",              "attr"  => \$Updates{branch_git});
@@ -272,7 +275,7 @@
 
 # XXX delete
 #	my $branch_name = FreshPorts::Branches::stripBranchesToGetBranchName($BranchName);
-#	$Update{branch_name} = $branch_name;
+#	$Updates{branch_name} = $branch_name;
 
 	print "OS is '$Updates{os}' : branch = '$Updates{branch_git}' for git\n";
 	
@@ -279,16 +282,42 @@
 	# When we moved from subversion to git, we needed to convert branch from
 	# master to head, because everything we need here is based on head.
 	#
-	# $Updates{branch}     : for database related actions (finding a port) e.g. head or 2020Q3
-	# $Updates{branch_git} : for repository related actions (git checkout)
+	# $Updates{branch_for_files} : for database related actions (finding a port) e.g. head or 2020Q3
+	# $Updates{branch_git}       : for repository related actions (git checkout)
+	#
+	# In the system_branch.branch_name column, we have values such as 2020Q4 and
+	# the prefix 'branches' is not included.
+	# 
+	# But for files, the prefix is included:
+	#  freshports.dev=# select * from element_pathname where pathname like '/ports/branches/2019Q3/%' limit 5;
+	#   element_id |                    pathname                     
+	#  ------------+-------------------------------------------------
+	#       960349 | /ports/branches/2019Q3/MOVED
+	#       954142 | /ports/branches/2019Q3/Mk
+	#       954143 | /ports/branches/2019Q3/Mk/Scripts
+	#       954144 | /ports/branches/2019Q3/Mk/Scripts/do-depends.sh
+	#       956066 | /ports/branches/2019Q3/Mk/Uses
+	# (5 rows)
+        # freshports.dev=# 
+        #
+        #
+        # So we have the following values:
+        #
+        # $Updates{branch_git}           - value supplied in XML
+        # $Updates{branch_database_name} - for use in system_branch.branch_name
+        # $Updates{branch_for_files}     - for use in filenames
+        #
 
-	$Updates{branch} = ConvertGitBranch($Updates{branch_git});
+        # this converts master to head, and leaves everything else unchanged
+        #
+	$Updates{branch_for_files} = ConvertGitBranchNameToFreshPortsName($Updates{branch_git});
 	
-	print "after converting '\$Updates{branch_git}' we have '$Updates{branch_git}'\n";
+	print "after converting '\$Updates{branch_git}' we have '$Updates{branch_for_files}'\n";
 	print "next we need to strip any leading 'branches/' prefix\n";
-	$Updates{branch} = FreshPorts::Branches::stripBranchesToGetBranchName($Updates{branch});
-	print "OS is '$Updates{os}' : branch = '$Updates{branch}'\n";
+	$Updates{branch_database_name} = FreshPorts::Branches::stripBranchesToGetBranchName($Updates{branch_for_files});
 	print "OS is '$Updates{os}' : branch = '$Updates{branch_git}' for git\n";
+	print "OS is '$Updates{os}' : branch = '$Updates{branch_for_files}' for git\n";
+	print "OS is '$Updates{os}' : branch = '$Updates{branch_database_name}' for database names\n";
 
 	# We know what branch this message is updating. Let's grab the IDs we will need.
 	$SystemID = SystemIDGet($Updates{os}, $self->{dbh});
@@ -297,12 +326,12 @@
 		FreshPorts::Utilities::ReportError('warning', "No SystemID found for OS = '$Updates{os}'", 1)
 	}
 
-	if ($Updates{branch} ne '') {
+	if ($Updates{branch_database_name} ne '') {
 		# we invoke GetBranchFromPathName to convert branches/2020Q3 to 2020Q3
-		$SystemBranchID = SystemBranchIDGetOrCreate($SystemID, $Updates{branch}, $self->{dbh});
+		$SystemBranchID = SystemBranchIDGetOrCreate($SystemID, $Updates{branch_database_name}, $self->{dbh});
 		if (!defined($SystemBranchID)) {
 			$! = 4;
-			FreshPorts::Utilities::ReportError('warning', "No SystemBranchID found for OS = '$Updates{branch}'", 1);
+			FreshPorts::Utilities::ReportError('warning', "No SystemBranchID found for OS = '$Updates{branch_database_name}'", 1);
 		} else {
 			$Updates{branch_id} = $SystemBranchID;
 		}
@@ -312,7 +341,7 @@
 		die   "Branch was empty.  Probably imported sources.  Ignoring message $inputfile\n";
 	}
    
-	print "OS is '$Updates{os}' ($SystemID) : branch = $Updates{branch} ($SystemBranchID)\n";
+	print "OS is '$Updates{os}' ($SystemID) : branch = $Updates{branch_database_name} ($SystemBranchID)\n";
 }
 
 
@@ -338,7 +367,7 @@
 		FreshPorts::Utilities::ReportError('Err', "No files found in commit '$Updates{commit_hash}'.  Has someone done a cvs import instead of addport?", 0)
 	}
 
-	%CommitLogPorts = FreshPorts::VerifyPort::SaveChangesToPortsTree($Updates{branch}, commit_log_id(), \@Files, $self->{dbh});
+	%CommitLogPorts = FreshPorts::VerifyPort::SaveChangesToPortsTree($Updates{branch_database_name}, commit_log_id(), \@Files, $self->{dbh});
 
 	#
 	# commit what we have now, and that starts a new transaction.
@@ -429,7 +458,9 @@
 
 	# we don't clear these values until the end of the update
 	undef $Updates{os};
-	undef $Updates{branch};
+	undef $Updates{branch_git};
+	undef $Updates{branch_database_name};
+	undef $Updates{branch_for_files};
 	undef $Updates{committerAll};
 	undef $Updates{dateyear};
 	undef $Updates{datemonth};
@@ -472,7 +503,7 @@
 	return $ValidFileActions{$FileAction};
 }
 
-sub ConvertGitBranch($) {
+sub ConvertGitBranchNameToFreshPortsName($) {
 	my $GitBranch = shift;
 	
 	#
@@ -610,7 +641,7 @@
 	my $element;
 	my $element_id;
 	# This is where we add in the repo name to the path
-	my $filename     = $DB_Root_Prefix . '/' . $Updates{branch} . '/' . $FilePath;
+	my $filename     = $DB_Root_Prefix . '/' . $Updates{branch_for_files} . '/' . $FilePath;
 	my $revisionname = $FileRevision;
 	my $commit_log_element;
 	
@@ -808,11 +839,12 @@
 	# The criteria for that is the subject must start with
 	# "cvs commit: ports/".
 
-	print "OS             = [$Updates{os}]\n";
-	print "Branch git     = [$Updates{branch_git}]\n";
-	print "Branch         = [$Updates{branch}]\n";
-	print "Committer      = [$Updates{committerAll}]\n";
-	print "Date           = [" . sprintf "%04u/%02u/%02u %02u:%02u:%02u %s", $Updates{dateyear}, $Updates{datemonth}, $Updates{dateday}, $Updates{timehour}, $Updates{timeminute}, $Updates{timesecond}, $Updates{timezone} . "]\n";
+	print "OS                   = [$Updates{os}]\n";
+	print "Branch git           = [$Updates{branch_git}]\n";
+	print "branch_database_name = [$Updates{branch_database_name}]\n";
+	print "branch_for_files     = [$Updates{branch_for_files}]\n";
+	print "Committer            = [$Updates{committerAll}]\n";
+	print "Date                 = [" . sprintf "%04u/%02u/%02u %02u:%02u:%02u %s", $Updates{dateyear}, $Updates{datemonth}, $Updates{dateday}, $Updates{timehour}, $Updates{timeminute}, $Updates{timesecond}, $Updates{timezone} . "]\n";
 	if (defined($Updates{repository})) {
 		print "Repository     = [$Updates{repository}]\n";
 	} else {
@@ -835,7 +867,7 @@
 
 	# First thing we must do, is tell the database what Branch to use...
 	# XXX why does this not use branches::SetBranchInDB() ?
-	my $sql = 'select freshports_branch_set(' . $self->{dbh}->quote($Updates{branch}) . ')';
+	my $sql = 'select freshports_branch_set(' . $self->{dbh}->quote($Updates{branch_database_name}) . ')';
 	my $sth = $self->{dbh}->prepare($sql);
 	if (!$sth->execute())  {
 		FreshPorts::Utilities::ReportError('warning', "Could not set branch", 1);
[dan@devgit-ingress01:~/modules] $ 

But wait, there’s more!

There were more things wrong too. I was going git checkout master on branches, which meant all the branch commits were being processed as if they were on master. Bad. :/

Some of that came out in git commit processing – how is it done? which was published after I started this blog post, but before I finished it.

Nov 222020
 

I need to document this so I can refer to it while debugging.

This follows the chain of scripts which processes a commit.

Periodic

FreshPorts, at present, checks for new commits every three minutes, via this entry in /etc/crontab:

*/3	*	*	*	*	root	periodic everythreeminutes

That will invoke this script:

$ cat /usr/local/etc/periodic/everythreeminutes/215.fp_check_git_for_commits 
#!/bin/sh -
#
# FreshPorts periodic script
#
# Checks to see if there are any new commits waiting
#

# If there is a global system configuration file, suck it in.
#
if [ -r /etc/defaults/periodic.conf ]
then
    . /etc/defaults/periodic.conf
    source_periodic_confs
fi

# assign default values
fp_scripts_dir=${fp_scripts_dir:-/usr/local/libexec/freshports}

case "$fp_check_for_git_commits_enable" in
	[Yy][Ee][Ss])
	logger -p local3.notice -t FreshPorts "into $0"
	echo ""
	cd $fp_scripts_dir && ./helper_scripts/check_for_git_commits.sh || rc=3
	;;
        
    *)  rc=0;;
esac

exit $rc

check_for_git_commits.sh

This is a cheat.

$ cat /usr/local/libexec/freshports/helper_scripts/check_for_git_commits.sh
#!/bin/sh

logger -t check_for_git_commits.sh -p local4.notice "touching ~ingress/signals/check_git ~ingress/signals/job_waiting"
echo touch ~ingress/signals/check_git ~ingress/signals/job_waiting | sudo su -fm ingress
logger -t check_for_git_commits.sh -p local4.notice "done touching, going away now"

It touches a file, which is a signal for the ingress daemon.

ingress.sh

$ cat /usr/local/libexec/freshports-service/ingress.sh
#!/bin/sh
#
# $Id: fp-daemon.sh,v 1.17 2006-11-10 14:08:26 dan Exp $
#
# Copyright (c) 2001-2003 DVL Software
#
#
# include our local parameters

. /usr/local/etc/freshports/ingress.sh

CP='/bin/cp'

# we do not use -i because that would fail when re re-run a commit
MV='/bin/mv'

RM='/bin/rm'

PERL='/usr/local/bin/perl'

#
# sanity checking upon startup
#

check_for_jobs() {
	#
	# This flag file is only set by a job run by this script.
	# A race condition should never arise.
	#
	FLAG="${INGRESS_FLAGDIR}/job_waiting"
	if [ -f ${FLAG} ]
	then
		cd ${SCRIPTDIR}
		echo "yes, there is a job waiting"
		echo "running ${PERL} ./job-waiting.pl"
		echo "from directory  ${SCRIPTDIR}"
		ls -l ./job-waiting.pl
		${PERL} ./job-waiting.pl
		if [ $? -eq 0 ]
		then
			echo "job-waiting.pl finishes normally"
		else
			echo "FATAL job-waiting.pl finished with an error: $?"
		fi
		rm ${FLAG}
	fi
}

echo "starting up!"

if [ ! -d ${SCRIPTDIR} ]
then
	echo "Required directory does not exist: ${SCRIPTDIR}"
	exit
fi

if [ ! -d ${INGRESS_MSGDIR}/incoming ]
then
	echo "Required directory does not exist: ${INGRESS_MSGDIR}/incoming"
	exit
fi

echo incoming: ${INGRESS_MSGDIR}/incoming
echo ready

while :
	do
	cd ${SCRIPTDIR}

	INCOMING=${INGRESS_MSGDIR}/incoming

	if [ -e 'OFFLINE' ]
	then
		echo "system is OFFLINE: ${SCRIPTDIR}/OFFLINE exists"
		break
	else
		check_for_jobs
	fi
	sleep 3
done

That script checks for files in the incoming queue. More on that, perhaps later.

Then, if not OFFLINE, it checks for waiting jobs.

job-waiting.pl

$ cat /usr/local/libexec/freshports/job-waiting.pl
#!/usr/local/bin/perl -w
#
# $Id: job-waiting.pl,v 1.3 2007-01-29 00:17:35 dan Exp $
#
# Copyright (c) 1999-2007 DVL Software
#
# This script is invoked by the fp-freshports.sh script
# usually located in /var/services/freshports
#

use strict;

use DBI;
use FreshPorts::database;
use FreshPorts::cache;
use FreshPorts::commit_log_ports_ignore;
use FreshPorts::system_status;
use FreshPorts::utilities;

# added in for testing
require Sys::Syslog;

FreshPorts::Utilities::InitSyslog();

#die('we are done here - stopped');

Sys::Syslog::syslog('warning', "running job-waiting.pl");


my %Jobs_ingress = (
	$FreshPorts::Config::CheckGit                 => 'check_git.sh',
	);

my %Jobs_freshports = (
	$FreshPorts::Config::MovedFileFlag            => 'process_moved.sh',
	$FreshPorts::Config::NewReposReadyForImport   => 'import_packagesite.py',
	$FreshPorts::Config::NewRepoImported          => 'UpdatePackagesFromRawPackages.py',
	$FreshPorts::Config::UpdatingFileFlag         => 'process_updating.sh',
	$FreshPorts::Config::VuXMLFileFlag            => 'process_vuxml.sh',
	$FreshPorts::Config::WWWENPortsCategoriesFlag => 'process_www_en_ports_categories.sh',
	);

FreshPorts::Utilities::Report('notice', "starting $0");

#	
# This script is invoked by either the freshports or the ingress user
# they have separate lists of jobs to look for. Rather than maintain two
# scripts, there is one.
#
my $username = getpwuid($<);
my %Jobs;

FreshPorts::Utilities::Report('notice', "running $0 as user = '$username'");

if ($username eq 'freshports') {
   %Jobs = %Jobs_freshports;
} elsif ($username eq 'ingress') {
   %Jobs = %Jobs_ingress;
} else {
  FreshPorts::Utilities::Report('notice', "WHO IS THAT USER? I don't know them. Stopping.");
  die($0 . ' must be run only as the ingress or freshports users');
  exit;
}

	
FreshPorts::Utilities::Report('notice', "checking jobs for $username");

my $JobFound;
do {
	$JobFound = 0;
	# one job might create another, so we keeping looping until they are all cleared.
	while (my ($flag, $script) = each %Jobs) {
		if (-f $flag) {
			$JobFound =1;
			FreshPorts::Utilities::Report('notice', "$flag exists.  About to run $script");
			`$FreshPorts::Config::scriptpath/$script`;
			FreshPorts::Utilities::Report('notice', "Finished running $script");
		} else {
			FreshPorts::Utilities::Report('notice', "flag '$flag' not set.  no work for $script");
		}
	}
} until (!$JobFound);

In there, we find that check_git.sh is invoked.

check_git.sh

$ cat /usr/local/libexec/freshports/check_git.sh
#!/bin/sh

# This script exists mainly to redirect the output of git-delta.sh to a logfile.
#

if [ ! -f /usr/local/etc/freshports/config.sh ]
then
	echo "/usr/local/etc/freshports/config.sh not found by $0"
	exit 1
fi

. /usr/local/etc/freshports/config.sh

LOGGERTAG=check_git.sh

${LOGGER} -t ${LOGGERTAG} $0 has started

# redirect everything into the file
${SCRIPTDIR}/git-delta.sh "doc ports ports-quarterly src" >> ${GITLOG} 2>&1

/bin/rm ${CHECKGITFILE}

${LOGGER} -t ${LOGGERTAG} $0 has finished

git-delta.sh

$ cat /usr/local/libexec/freshports/git-delta.sh
#!/bin/sh

# process the new commits
# based upon https://github.com/FreshPorts/git_proc_commit/issues/3
# An idea from https://github.com/sarcasticadmin

if [ ! -f /usr/local/etc/freshports/config.sh ]
then
	echo "/usr/local/etc/freshports/config.sh.sh not found by $0"
	exit 1
fi

# this can be a space separated list of repositories to check
# e.g. "doc ports src"
repos=$1

. /usr/local/etc/freshports/config.sh

LOGGERTAG='git-delta.sh'

logfile "has started. Will check these repos: '${repos}'"

# what remote are we using on this repo?
REMOTE='origin'

# where we do dump the XML files which we create?
XML="${INGRESS_MSGDIR}/incoming"

logfile "XML dir is $XML"

for repo in ${repos}
do
   logfile "Now processing repo: ${repo}"

   # convert the repo label to a physical directory on disk
   dir=`convert_repo_label_to_directory ${repo}`

   # empty means error
   if [  "${dir}" == "" ]; then
      logfile "FATAL error, repo='${repo}' is unknown: cannot translate it to a directory name"
      continue
   fi

   # where is the repo directory?
   # This is the directory which contains the repos.
   REPODIR="${INGRESS_PORTS_DIR_BASE}/${dir}"
   LATEST_FILE="${INGRESS_PORTS_DIR_BASE}/latest.${dir}"

   if [ -d ${REPODIR} ]; then
      logfile "REPODIR='${REPODIR}' exists"
   else
      logfile "FATAL error, REPODIR='${REPODIR}' is not a directory"
      continue
   fi

   if [ -f ${LATEST_FILE} ]; then
      logfile "LATEST_FILE='${LATEST_FILE}' exists"
   else
      logfile "FATAL error, LATEST_FILE='${LATEST_FILE}' does not exist. We need a starting point."
      continue
   fi

   logfile "Repodir is $REPODIR"
   # on with the work

   cd ${REPODIR}

   # Update local copies of remote branches
#   logfile "Running: ${GIT} fetch $REMOTE:"
#   ${GIT} fetch $REMOTE
#   logfile "Done."

#   logfile "Running: ${GIT} checkout master:"
#   ${GIT} checkout master
#   logfile "Done."

   logfile "Running: ${GIT} pull:"
   ${GIT} pull
   logfile "Done."

   # let's try having the latest commt in this this.
   STARTPOINT=`cat ${LATEST_FILE}`

   if [ "${STARTPOINT}x" = 'x' ]
   then
      logfile "STARTPOINT is empty; there must not be any new commits to process"
      logfile "Not proceeding with this repo: '${repo}'"
      continue
   else
      logfile "STARTPOINT = ${STARTPOINT}"
   fi

   # Bring local branch up-to-date with the local remote
#   logfile "Running; ${GIT} rebase $REMOTE/master:"
#   ${GIT} rebase $REMOTE/master
#   logfile "Running; ${GIT} fetch:"
#   ${GIT} fetch
#   logfile "Done."


   # get list of commits, if only to document them here
   logfile "Running: ${GIT} rev-list ${STARTPOINT}..HEAD"
   commits=`${GIT} rev-list ${STARTPOINT}..HEAD`
   logfile "Done."

   if [ -z "commits" ]
   then
     logfile "No commits were found"
   else
     logfile "The commits found are:"
     for commit in $commits
     do
        logfile "$commit"
     done
   fi

   logfile "${SCRIPTDIR}/git-to-freshports-xml.py --repo ${repo} --path ${REPODIR} --commit ${STARTPOINT} --spooling ${INGRESS_SPOOLINGDIR} --output ${XML}"
            ${SCRIPTDIR}/git-to-freshports-xml.py --repo ${repo} --path ${REPODIR} --commit ${STARTPOINT} --spooling ${INGRESS_SPOOLINGDIR} --output ${XML}
         
   new_latest=`${GIT}  rev-parse HEAD`
   echo $new_latest > ${LATEST_FILE}

done

logfile "Ending"

git-to-freshports-xml.py creates the XML files which are placed into the incoming queue (at ~ingress/message-queues/incoming).

The files are noticed by the freshports daemon (running as /usr/local/libexec/freshports-service/freshports.sh).

Nov 182020
 

A recent post on the FreeBSD Ports mailing list asked:

Hi,

I noticed a big difference between the number of ports on
freebsd.org/ports/ and on freshports.org. Currently, it’s 33348 vs.
41346.

The freebsd.org’s number equals roughly the number of lines of a current
INDEX, but how does FreshPorts count?

Best,
Moritz

In short, they are both wrong.

The FreeBSD value is based on INDEX, which includes flavors. The counts on the webpages under https://www.freebsd.org/ports/ will list some ports multiple times. See below for examples.

The FreshPorts total is wrong because it is including ports on branches.

The real number of ports is in the 28,800 range.

It is debatable whether py27-atspi and py37-atspi should be listed as separate ports. There are separate packages, yes, but they are both generated from one port: accessibility/py-atspi.

The rest of this post has background on how I reached these values.

Where is this FreshPorts count?

In the Statistics box on the right hand side of FreshPorts, you will see:

Statistics box saying Port Count 41418

Statistics box saying Port Count 41418

Let’s see where this value comes from.

FreshPorts count

Everything in the Statistics box is generated by the backend via a periodic job. Let’s grep the code and find out where:

[dan@dev-ingress01:~/scripts] $ grep -r 'Calculated hourly' *
hourly_stats.pl:		print FILE '<BR>Calculated hourly:<BR>';

If I look in there, I find: select Stats_PortCount()

Going to the sp.txt file, I find this stored procedure:

CREATE OR REPLACE FUNCTION Stats_PortCount() returns int8 AS $$
        DECLARE
                PortCount       int8;

        BEGIN
                SELECT count(*)
                  INTO PortCount
                  FROM ports, element
                 WHERE element.status = 'A'
                   AND ports.element_id = element.id;

                return PortCount;
        END
$$ LANGUAGE 'plpgsql';

Let’s run that query:

freshports.org=# select * from Stats_PortCount();
 stats_portcount 
-----------------
           41418
(1 row)

freshports.org=# 

FreshPorts count with branches

I know why this values is so far from the FreeBSD count. Branches. Let’s look at this output where I start pulling back the port names:

freshports.org=# SELECT EP.pathname
freshports.org-#                   FROM ports P , element E, element_pathname EP
freshports.org-#                  WHERE E.status = 'A'
freshports.org-#                    AND P.element_id = E.id
freshports.org-#                    AND E.id = EP.element_id
freshports.org-#                  ORDER BY EP.pathname LIMIT 10;
                     pathname                     
--------------------------------------------------
 /ports/branches/2016Q2/math/blitz++
 /ports/branches/2016Q4/archivers/file-roller
 /ports/branches/2016Q4/archivers/p7zip
 /ports/branches/2016Q4/archivers/p7zip-codec-rar
 /ports/branches/2016Q4/archivers/php56-bz2
 /ports/branches/2016Q4/archivers/php56-phar
 /ports/branches/2016Q4/archivers/php56-zip
 /ports/branches/2016Q4/archivers/php56-zlib
 /ports/branches/2016Q4/archivers/php70-bz2
 /ports/branches/2016Q4/archivers/php70-phar
(10 rows)

freshports.org=# 

FreshPorts count without branches

Let’s try the query and ignore branches.

freshports.org=# SELECT EP.pathname
freshports.org-#   FROM ports P , element E, element_pathname EP
freshports.org-#  WHERE E.status = 'A'
freshports.org-#    AND P.element_id = E.id
freshports.org-#    AND E.id = EP.element_id
freshports.org-#    AND EP.pathname NOT LIKE '/ports/branches/%'
freshports.org-#  ORDER BY EP.pathname desc
freshports.org-#  LIMIT 10;
          pathname           
-----------------------------
 /ports/head/x11/zenity
 /ports/head/x11/yelp
 /ports/head/x11/yeahconsole
 /ports/head/x11/yalias
 /ports/head/x11/yakuake
 /ports/head/x11/yad
 /ports/head/x11/xzoom
 /ports/head/x11/xxkb
 /ports/head/x11/xwud
 /ports/head/x11/xwit
(10 rows)

freshports.org=# 

That looks better.

Let’s get a count now.

freshports.org=# SELECT count(*)
freshports.org-#   FROM ports P , element E, element_pathname EP
freshports.org-#  WHERE E.status = 'A'
freshports.org-#    AND P.element_id = E.id
freshports.org-#    AND E.id = EP.element_id
freshports.org-#    AND EP.pathname NOT LIKE '/ports/branches/%';
 count 
-------
 28759
(1 row)

freshports.org=# 

Well, that’s not great either.

That can’t be right

Let’s suspect the element_pathname table and remove it from the query. Instead, I will create the pathname based on a function:

freshports.org=# 
freshports.org=# SELECT count(*) FROM (
freshports.org(# SELECT element_pathname(E.id) as pathname
freshports.org(#   FROM ports P , element E
freshports.org(#  WHERE E.status = 'A'
freshports.org(#    AND P.element_id = E.id) AS tmp
freshports.org-# WHERE pathname NOT LIKE '/ports/branches/%';
 count 
-------
 28759
(1 row)

freshports.org=# 

That matches the count via the element_pathname table.

So it’s not that table skewing the results. What is it then?

Looking at category counts

Let’s compare https://www.freebsd.org/ports/categories-alpha.html with FreshPorts.

Let’s start with this query on the port_active table, which is actually a view of non-deleted ports.

  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%' limit 10;
  category  |            name            |                  pathname                  
------------+----------------------------+--------------------------------------------
 textproc   | rubygem-raabro             | /ports/head/textproc/rubygem-raabro
 biology    | pyfasta                    | /ports/head/biology/pyfasta
 math       | symmetrica                 | /ports/head/math/symmetrica
 java       | sigar                      | /ports/head/java/sigar
 databases  | phpmyadmin5                | /ports/head/databases/phpmyadmin5
 devel      | rubygem-rbtrace            | /ports/head/devel/rubygem-rbtrace
 x11        | xfce4-screenshooter-plugin | /ports/head/x11/xfce4-screenshooter-plugin
 science    | hdf5-18                    | /ports/head/science/hdf5-18
 lang       | nhc98                      | /ports/head/lang/nhc98
 multimedia | xanim                      | /ports/head/multimedia/xanim
(10 rows)

Now, it’s get count by category.

freshports.dev=# SELECT PA.category, count(PA.name)
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
 GROUP BY PA.category
 ORDER BY PA.category;
   category    | count 
---------------+-------
 accessibility |    26
 arabic        |     8
 archivers     |   258
 astro         |   124
 audio         |   877
 base          |     1
 benchmarks    |   100
 biology       |   176
 cad           |   126
 chinese       |   106
 comms         |   213
 converters    |   178
 databases     |  1033
 deskutils     |   261
 devel         |  6875
 dns           |   238
 editors       |   263
 emulators     |   177
 finance       |   113
 french        |    14
 ftp           |    96
 games         |  1133
 german        |    21
 graphics      |  1128
 hebrew        |     7
 hungarian     |     7
 irc           |   114
 japanese      |   280
 java          |   122
 korean        |    39
 lang          |   364
 mail          |   709
 math          |   970
 misc          |   533
 multimedia    |   457
 net           |  1563
 net-im        |   176
 net-mgmt      |   404
 net-p2p       |    94
 news          |    67
 polish        |    14
 ports-mgmt    |    67
 portuguese    |     9
 print         |   256
 russian       |    32
 science       |   340
 security      |  1313
 shells        |    56
 sysutils      |  1538
 textproc      |  1896
 ukrainian     |     9
 vietnamese    |    16
 www           |  2358
 x11           |   534
 x11-clocks    |    42
 x11-drivers   |    44
 x11-fm        |    30
 x11-fonts     |   250
 x11-servers   |    10
 x11-themes    |   145
 x11-toolkits  |   240
 x11-wm        |   119
(62 rows)

freshports.dev=# 

Primary categories vs secondary categories

Remember that some categories are virtual, and do not appear on disk. The above count are only for primary categories, those which do appear on disk. For example, afterstep is not listed above, but you’ll find it in the FreeBSD list. The above SQL is for primary categories only. Virtual categories are covered in FreshPorts, but it’s not relevant to our search.

Also, a port exists on disk only within its primary category. There may be secondary categories, but the port should not be counted there as well. A port should only be counted once.

Picking on Hungarian

Let’s pick Hungarian, which has a small number of ports.

freshports.dev=# SELECT PA.category, PA.name, EP.pathname
freshports.dev-#   FROM ports_active PA, element_pathname EP
freshports.dev-#  WHERE PA.element_id = EP.element_id
freshports.dev-#    AND EP.pathname NOT LIKE '/ports/branches/%'
freshports.dev-#    AND PA.category = 'hungarian'
freshports.dev-#  ORDER BY EP.pathname
freshports.dev-# LIMIT 10;
 category  |           name           |                    pathname                    
-----------+--------------------------+------------------------------------------------
 hungarian | aspell                   | /ports/head/hungarian/aspell
 hungarian | hunspell                 | /ports/head/hungarian/hunspell
 hungarian | hyphen                   | /ports/head/hungarian/hyphen
 hungarian | jdictionary-eng-hun      | /ports/head/hungarian/jdictionary-eng-hun
 hungarian | jdictionary-eng-hun-expr | /ports/head/hungarian/jdictionary-eng-hun-expr
 hungarian | libreoffice              | /ports/head/hungarian/libreoffice
 hungarian | mythes                   | /ports/head/hungarian/mythes
(7 rows)

freshports.dev=# 

Let’s compare that with what is on disk:

[dan@pkg01:~/ports/head/hungarian] $ ls -l
total 13
-rw-r--r--  1 dan  dan  332 May  5  2020 Makefile
-rw-r--r--  1 dan  dan   97 Oct 27  2019 Makefile.inc
drwxr-xr-x  2 dan  dan    6 Oct 27  2019 aspell
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 hunspell
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 hyphen
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 jdictionary-eng-hun
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 jdictionary-eng-hun-expr
drwxr-xr-x  2 dan  dan    4 Nov 12 15:09 libreoffice
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 mythes
[dan@pkg01:~/ports/head/hungarian] $ svn info

Don’t trust me. Look at subversion for ports/head/hungarian/

The FreshPorts count is correct. What is FreeBSD talking about then?

Comparing with https://www.freebsd.org/ports/hungarian.html, I see that FreeBSD is including:

Will this account for the differences? I don’t know.

The 33399 count listed at https://www.freebsd.org/ports/ (on 2020-11-18) seems close the value contained within INDEX-12 (33406).

The category totals at https://www.freebsd.org/ports/categories-grouped.html include ports listed in their secondary categories. This counts some ports more than once.

Looking at INDEX

Let’s look at INDEX-12:

[dan@pkg01:~/ports/head] $ make fetchindex
/usr/bin/env  fetch -am -o /usr/home/dan/ports/head/INDEX-12.bz2 https://www.FreeBSD.org/ports/INDEX-12.bz2
/usr/home/dan/ports/head/INDEX-12.bz2                 2315 kB 1436 kBps    02s
[dan@pkg01:~/ports/head] $ 


[dan@pkg01:~/ports/head] $ wc -l INDEX-12 
   33406 INDEX-12

[dan@pkg01:~/ports/head] $ grep -c jdictionary-ger-hun INDEX-12 
1

OK, it’s only counted once within INDEX.

So far, we know why the port counts on the web pages differ.

Let’s pick a category which is not language related: x11-servers

This is what FreshPorts has:

freshports.org=# SELECT PA.category, PA.name, EP.pathname
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
   AND PA.category = 'x11-servers'
 ORDER BY EP.pathname;
  category   |      name       |                pathname                 
-------------+-----------------+-----------------------------------------
 x11-servers | Xfstt           | /ports/head/x11-servers/Xfstt
 x11-servers | x2vnc           | /ports/head/x11-servers/x2vnc
 x11-servers | x2x             | /ports/head/x11-servers/x2x
 x11-servers | xephyr          | /ports/head/x11-servers/xephyr
 x11-servers | xorg-dmx        | /ports/head/x11-servers/xorg-dmx
 x11-servers | xorg-nestserver | /ports/head/x11-servers/xorg-nestserver
 x11-servers | xorg-server     | /ports/head/x11-servers/xorg-server
 x11-servers | xorg-vfbserver  | /ports/head/x11-servers/xorg-vfbserver
 x11-servers | xwayland        | /ports/head/x11-servers/xwayland
 x11-servers | xwayland-devel  | /ports/head/x11-servers/xwayland-devel
(10 rows)

freshports.org=# 

From disk:

[dan@pkg01:~/ports/head/x11-servers] $ ls -l
total 34
-rw-r--r--  1 dan  dan  375 Feb 14  2020 Makefile
drwxr-xr-x  3 dan  dan    7 Sep 19 01:19 Xfstt
drwxr-xr-x  2 dan  dan    5 Nov  9  2019 x2vnc
drwxr-xr-x  3 dan  dan    6 Nov  9  2019 x2x
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xephyr
drwxr-xr-x  2 dan  dan    5 Feb 25  2020 xorg-dmx
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xorg-nestserver
drwxr-xr-x  3 dan  dan    8 Sep 19 01:19 xorg-server
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xorg-vfbserver
drwxr-xr-x  2 dan  dan    4 Oct 11 13:22 xwayland
drwxr-xr-x  2 dan  dan    5 Nov 16 15:28 xwayland-devel
[dan@pkg01:~/ports/head/x11-servers] $ 

That matches.

Looking at https://www.freebsd.org/ports/x11-servers.html I find listings not found above:

  1. tigervnc-server net/tigervnc-server
  2. tigervnc-viewer net/tigervnc-viewer
  3. xorg-minima x11/xorg-minimal

Again, it is ports listed here, which are not actually in this category. Ports are being counted twice, at least in the web page.

This extracts the list of ports from INDEX:

[dan@pkg01:~/ports/head] $ cut -f 2 -d '|' INDEX-12 > ~/tmp/INDEX-12-list
[dan@pkg01:~/ports/head] $ head -4  ~/tmp/INDEX-12-list
/usr/ports/accessibility/accerciser
/usr/ports/accessibility/at-spi2-atk
/usr/ports/accessibility/at-spi2-core
/usr/ports/accessibility/atkmm
[dan@pkg01:~/ports/head] $ 
[dan@pkg01:~/ports/head] $ wc -l INDEX-12  ~/tmp/INDEX-12-list
   33406 INDEX-12
   33406 /usr/home/dan/tmp/INDEX-12-list
   66812 total

The line count matches. Let’s get the same information out of FreshPorts, but this time, I’ll use production.

cat << EOF | psql -t freshports.org > INDEX.FreshPorts
SELECT '/usr/ports/' || PA.category || '/' || PA.name
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
 ORDER BY 1;
EOF

We have 28759 entries there.

$ wc -l ~/INDEX.FreshPorts 
   28759 /usr/home/dan/INDEX.FreshPorts

That is far from the 33406 lines in INDEX-12.

Removing flavors from INDEX-12 list of ports

When I started comparing the output, I noticed that INDEX-12 listed accessibility/py-atspi twice. Why? Because of flavors. Here are the first two columns from INDEX-12:

py27-atspi-2.38.0|/usr/ports/accessibility/py-atspi
py37-atspi-2.38.0|/usr/ports/accessibility/py-atspi

Let’s remove duplicate lines from INDEX-12:

[dan@pkg01:~/ports/head] $ wc -l INDEX-12  ~/tmp/INDEX-12-list ~/tmp/INDEX-12-list-nodups
   33406 INDEX-12
   33406 /usr/home/dan/tmp/INDEX-12-list
   28755 /usr/home/dan/tmp/INDEX-12-list-nodups
   95567 total
[dan@pkg01:~/ports/head] $ 

That means 4651 lines relate directly to flavors.

That uniq output is much closer to the FreshPorts count of 28759. It is off by 4.

Comparing INDEX-12 and FreshPorts

Let’s do a diff.

All the + lines indicates a port included in FreshPorts, but not INDEX-12. I have annotated the output to indicate what my investigations found.

All the lines indicate something not found on FreshPorts.

When you see DELETED, that means FreshPorts has marked this port was deleted.

[dan@pkg01:~/ports/head] $ diff -ruN ~/tmp/INDEX-12-list-nodups ~/INDEX.FreshPorts
--- /usr/home/dan/tmp/INDEX-12-list-nodups	2020-11-18 16:58:12.360853000 +0000
+++ /usr/home/dan/INDEX.FreshPorts	2020-11-18 17:49:49.321133000 +0000
@@ -1290,6 +1290,7 @@
 /usr/ports/audio/zita-resampler
 /usr/ports/audio/zrythm
 /usr/ports/audio/zynaddsubfx
+/usr/ports/base/binutils NOT A PORT
 /usr/ports/benchmarks/ali
 /usr/ports/benchmarks/apib
 /usr/ports/benchmarks/autobench
@@ -7405,7 +7406,6 @@
 /usr/ports/devel/php80-sysvsem
 /usr/ports/devel/php80-sysvshm
 /usr/ports/devel/php80-tokenizer
-/usr/ports/devel/phpunit6 PORT MARKED AS DELETED
 /usr/ports/devel/phpunit7
 /usr/ports/devel/phpunit8
 /usr/ports/devel/physfs
@@ -7703,6 +7703,7 @@
 /usr/ports/devel/py-cachy
 /usr/ports/devel/py-canonicaljson
 /usr/ports/devel/py-capstone
+/usr/ports/devel/py-case NEWLY CREATED
 /usr/ports/devel/py-castellan
 /usr/ports/devel/py-castellan1
 /usr/ports/devel/py-cbor
@@ -9873,6 +9874,7 @@
 /usr/ports/devel/rubygem-rspec-support
 /usr/ports/devel/rubygem-rspec_junit_formatter
 /usr/ports/devel/rubygem-rubocop
+/usr/ports/devel/rubygem-rubocop-ast NOT IN INDEX-12
 /usr/ports/devel/rubygem-ruby-atmos-pure
 /usr/ports/devel/rubygem-ruby-bugzilla
 /usr/ports/devel/rubygem-ruby-enum
@@ -14078,7 +14080,6 @@
 /usr/ports/korean/hanyangfonts
 /usr/ports/korean/hcode
 /usr/ports/korean/hmconv
-/usr/ports/korean/hpscat PORT MARKED AS DELETED
 /usr/ports/korean/hunspell
 /usr/ports/korean/ibus-hangul
 /usr/ports/korean/imhangul-gtk2
@@ -14439,6 +14440,7 @@
 /usr/ports/lang/spidermonkey24
 /usr/ports/lang/spidermonkey52
 /usr/ports/lang/spidermonkey60
+/usr/ports/lang/spidermonkey68 NOT IN SUBVERSION
 /usr/ports/lang/spidermonkey78
 /usr/ports/lang/spl
 /usr/ports/lang/squeak
@@ -14958,6 +14960,7 @@
 /usr/ports/mail/py-dkimpy
 /usr/ports/mail/py-email-validator
 /usr/ports/mail/py-email_reply_parser
+/usr/ports/mail/py-flanker NOT IN INDEX-12
 /usr/ports/mail/py-flask-mail
 /usr/ports/mail/py-flufl.bounce
 /usr/ports/mail/py-fuglu
[dan@pkg01:~/ports/head] $ 

Totals:

  1. NOT A PORT – base looks a category to FreshPorts so that is included
  2. PORT MARKED AS DELETED – FreshPorts thinks this port is deleted, but it is not
  3. NEWLY CREATED – this port was created today. INDEX-12 predates that
  4. NOT IN INDEX-12 – no idea why this is not included
  5. NOT IN SUBVERSION – this port is not listed in subversion.

Conclusion

FreshPorts has some errors, which I will look into.

The actual number of ports is wrong on both sites and the correct values is in the 28,800 range.