Moving towards commit hooks – FreshPorts News

I want to move FreshPorts towards using commit hooks and away from depending upon incoming emails for processing new commits.

Much of the following came from a recent Twitter post.

You might think: why are we using emails? Why? Because we can. They were the easiest and most simple approach. It is a time-proven solution. Look at https://docs.freshports.org/ and you can see the original ideas from 2001. That is over 18 years of providing data.

If email is so good, why stop?

Because we can.

And we won’t stop using email.

Email will stay around as a fall-back position. Commit hooks are tighter dependency upon a third party and requires close cooperation. Should that relationship sour, the cooperation may terminate.

If web-hooks proceed, email processing will be modified to introduce an N-minute delay. After leaving the N-minute queue, the mail will be:

ignored if the commit has already been processed
processed if the commit is not in the database

How is a commit identified

Email processing is based upon the Message-Id contained within the database. Duplicates are ignored.

I am not sure if we also check the subversion revision number. That might be wise. There is an index, but it is not unique.

If we move to commit-hooks, message-id will not be available. We will have to change to relying upon the revision number or, in the case of git, the commit hash.

ACTIONS:

add unique ID to commit_log.svn_revision
remove not null constraint on commit_log.message_id
add commit_log.commit_hash with a unique index

Commit processing

Regardless of how we get notified of a new commit, we must be able to put our local copy of the repo into the state as of a given commit.

For subversion, we do this:

svn up -r REVISION

After this, various commands, such as make -V, are run to extract the necessary values from the ports tree (as of the commit). This information includes PORTVERSION, PORTREVISION, etc. You can see why is it vital to have everything in our ports tree reflect the repo as of that particular commit.

For git, it is it similar:

git checkout HASH

The same scripts, as describe above, would be run.

Commit hooks

These are the assumptions for a commit hook:

the hook gets triggered exactly once per commit
the hook is fast, so as not to slow down commits

In order to be fast, the basic information has to be passed along to another daemon, which then puts it into a queue, which is then processed by another daemon. This queue must be persistent.

I am using hare and hared here as examples only because I am familiar with them. They won’t actually what I need, but if I was to fork them and modify them for this specific task, I think they would do the job rather well.

My initial thoughts are:

The hook invokes something like hare (see also sysutils/hare) which sends a udp packet to something else. The packet contains the commit revision number (if subversion) or hash (if git).
The udp is received by something like hared (same link as above for hare, but available via sysutils/py-hared).
hared then adds the data to a queue. What type of queue and where it is located is for later design.

Commit iteration

When processing email, the looping through the email is your iteration. When you have no email, you need something to iterate through.

git commit iteration

I think this is the command we want to use when iterating through git commits:

git rev-list ..HEAD

Where is the hash of our most recently processed commit. Most recently is not necessarily the last one we committed. It is the commit with the most recent timestamp. Here is an example:

$ git rev-list ee38cccad8f76b807206165324e7bf771aa981dc..HEAD
0ca46774ac207517724eb48338c04a4dbde0728a
a3361806ab49043fca46f81a0edc2357b7d3947c

Using the above, perhaps the logic for processing commits will be:

detect a new commit
git pull
use git rev-list to get list of new commits
for i = oldest new commit to newest new commit {
  git checkout a commit
  magic
}

subversion commit iteration

With subversion we have a revision id, which is an integer.

The repo can be queried for their highest commit via:

svn log -r head

With that revision number, the code to process the commits is

for i = LastCommitProcess + 1; i <= LatestCommit; i++ {
  svn up -r $i
  process that commit
}

How do we handle gaps in the subversion revision sequence? If we have commits, 5, 7, and 8, where is commit 6? How do we note that commit 6 does not exist and that we need to get it? What if the repo has no commit 6?