Processing commits on branches with git

Transitioning from subversion to git was both technically and personally challenging. During the COVID-19 pandemic there was a lot of work to carry out. A new server was created on AWS with a new layout and structure. For a few months it was nothing but my day job and FreshPorts in all my non-working hours.

This post was followed by Processing commits on branches with git – part 2.

The new behind-the-scenes code works better than the subversion code, mostly because it no longer relies upon email for notifications of new commits to the code.

One aspect of commit processing not yet tackled is commits on branches.

Back in November 2020, Mathieu Arnold wrote about a proposed approach. This is what I’m going to start on today.

From what I understood, freshports never needs to be able to access
files on the top of a branch, it only needs to access the files on
specific commits (the fact that a commit is at the top of the branch is
only an artefact of the process). So, you never need to checkout any
branches, you only need to checkout commits. So you never have a HEAD
that points to a branch, HEAD is always in a detached state and points
to a commit.

So what you need to do is, git clone the repository, and then, each time
the script runs, do (in somewhat shell script):

NAME_OF_REMOTE=origin
NAME_OF_HEAD=main

cd /where/the/repo/is

git fetch

# get all references (so, branches, tags, and so one) and keep only
# branches (named commits here) that the remote repository has

git for-each-ref --format '%(objecttype) %(refname)' \
  | sed -n 's/^commit refs\/remotes\///p'
  | while read -r type refname
do

  # If we don't have the tag, it means we have not encountered the
  # branch yet.
  if ! git tag -l freshports/$refname
  then

    # get the first commit of that branch and create a tag.
    git tag -m "first known commit of $refname" -f freshports/$refname $(git merge-base $NAME_OF_REMOTE/$NAME_OF_HEAD $refname)
  fi

  # Get the list of commits between the last known one and the tip of
  # the branch, list may be empty.
  git rev-list freshports/$refname..$refname | while read commithash
  do
    # checkout that commit (with -f so that if some file got changed, we
    # overwrite everything
    git checkout -f $commithash

    # process the commit
  done

  # Store the last known commit that we just processed.
  git tag -m "last known commit of $refname" -f freshports/$refname $refname
done

That’s all, you never need to merge or pull or whatever else.

My starting point, the existing git-delta.sh script, which obtains a list of commits which have occurred after a given commit.

Get the repo

Here, I get the ports repo.

mkdir -p ~/src/repos
git clone https://git.FreeBSD.org/ports.git
cd ports

First attempt

[dan@mydev:~/src/repos/ports] $ git for-each-ref --format '%(objecttype) %(refname)' 
commit refs/heads/2021Q2
commit refs/heads/main
commit refs/remotes/origin/2014Q1
commit refs/remotes/origin/2014Q2
commit refs/remotes/origin/2014Q3
commit refs/remotes/origin/2014Q4
commit refs/remotes/origin/2015Q1
commit refs/remotes/origin/2015Q2
commit refs/remotes/origin/2015Q3
commit refs/remotes/origin/2015Q4
commit refs/remotes/origin/2016Q1
commit refs/remotes/origin/2016Q2
commit refs/remotes/origin/2016Q3
commit refs/remotes/origin/2016Q4
commit refs/remotes/origin/2017Q1
commit refs/remotes/origin/2017Q2
commit refs/remotes/origin/2017Q3
commit refs/remotes/origin/2017Q4
commit refs/remotes/origin/2018Q1
commit refs/remotes/origin/2018Q2
commit refs/remotes/origin/2018Q3
commit refs/remotes/origin/2018Q4
commit refs/remotes/origin/2019Q1
commit refs/remotes/origin/2019Q2
commit refs/remotes/origin/2019Q3
commit refs/remotes/origin/2019Q4
commit refs/remotes/origin/2020Q1
commit refs/remotes/origin/2020Q2
commit refs/remotes/origin/2020Q3
commit refs/remotes/origin/2020Q4
commit refs/remotes/origin/2021Q1
commit refs/remotes/origin/2021Q2
commit refs/remotes/origin/HEAD
commit refs/remotes/origin/main
tag refs/tags/10-eol
tag refs/tags/4-eol
tag refs/tags/5-eol
tag refs/tags/6-eol
tag refs/tags/7-eol
tag refs/tags/8-eol
tag refs/tags/9-eol
tag refs/tags/pkg-install-eol
tag refs/tags/pre-xorg-7
tag refs/tags/release/10.0.0
tag refs/tags/release/10.1.0
tag refs/tags/release/10.2.0
tag refs/tags/release/10.3.0
tag refs/tags/release/10.4.0
tag refs/tags/release/11.0.0
tag refs/tags/release/11.1.0
tag refs/tags/release/11.2.0
tag refs/tags/release/11.3.0
tag refs/tags/release/11.4.0
tag refs/tags/release/12.0.0
tag refs/tags/release/12.1.0
tag refs/tags/release/12.2.0
tag refs/tags/release/13.0.0
tag refs/tags/release/2.0.5
tag refs/tags/release/2.0.5a
tag refs/tags/release/2.1.0
tag refs/tags/release/2.1.5
tag refs/tags/release/2.1.6
tag refs/tags/release/2.1.7
tag refs/tags/release/2.2.0
tag refs/tags/release/2.2.1
tag refs/tags/release/2.2.2
tag refs/tags/release/2.2.5
tag refs/tags/release/2.2.6
tag refs/tags/release/2.2.7
tag refs/tags/release/2.2.8
tag refs/tags/release/3.0.0
tag refs/tags/release/3.1.0
tag refs/tags/release/3.2.0
tag refs/tags/release/3.3.0
tag refs/tags/release/3.4.0
tag refs/tags/release/3.5.0
tag refs/tags/release/4.0.0
tag refs/tags/release/4.1.0
tag refs/tags/release/4.1.1
tag refs/tags/release/4.10.0
tag refs/tags/release/4.11.0
tag refs/tags/release/4.2.0
tag refs/tags/release/4.3.0
tag refs/tags/release/4.4.0
tag refs/tags/release/4.5.0
tag refs/tags/release/4.6.0
tag refs/tags/release/4.6.1
tag refs/tags/release/4.6.2
tag refs/tags/release/4.7.0
tag refs/tags/release/4.8.0
tag refs/tags/release/4.9.0
tag refs/tags/release/5.0.0
tag refs/tags/release/5.1.0
tag refs/tags/release/5.2.0
tag refs/tags/release/5.2.1
tag refs/tags/release/5.3.0
tag refs/tags/release/5.4.0
tag refs/tags/release/5.5.0
tag refs/tags/release/6.0.0
tag refs/tags/release/6.1.0
tag refs/tags/release/6.2.0
tag refs/tags/release/6.3.0
tag refs/tags/release/6.4.0
tag refs/tags/release/7.0.0
tag refs/tags/release/7.1.0
tag refs/tags/release/7.2.0
tag refs/tags/release/7.3.0
tag refs/tags/release/7.4.0
tag refs/tags/release/8.0.0
tag refs/tags/release/8.1.0
tag refs/tags/release/8.2.0
tag refs/tags/release/8.3.0
tag refs/tags/release/8.4.0
tag refs/tags/release/9.0.0
tag refs/tags/release/9.1.0
tag refs/tags/release/9.2.0
tag refs/tags/release/9.3.0
[dan@mydev:~/src/repos/ports] $ 
Raw

Applying the filter

My first try brought back empty strings:

[dan@mydev:~/src/repos/ports] $ git for-each-ref --format '%(objecttype) %(refname)' \
  | sed -n 's/^commit refs\/remotes\///p' \
  | while read -r type refname
do
  echo ref is "'$refname'"
done
ref is ''
ref is ''
ref is ''
ref is ''
...

madree explained this to me as: “the sed expression comes up with just freebsd/HEAD and freebsd/main for me and then read can’t possibly read two columns (type and refname). try omitting the “type” word and use just “while read -r refname” and see what you get”.

And so I did:

[dan@mydev:~/src/repos/ports] $ git for-each-ref --format '%(objecttype) %(refname)'   | sed -n 's/^commit refs\/remotes\///p'   | while read -r refname; do   echo ref is "'$refname'"; done
ref is 'origin/2014Q1'
ref is 'origin/2014Q2'
ref is 'origin/2014Q3'
ref is 'origin/2014Q4'
ref is 'origin/2015Q1'
ref is 'origin/2015Q2'
ref is 'origin/2015Q3'
ref is 'origin/2015Q4'
ref is 'origin/2016Q1'
ref is 'origin/2016Q2'
ref is 'origin/2016Q3'
ref is 'origin/2016Q4'
ref is 'origin/2017Q1'
ref is 'origin/2017Q2'
ref is 'origin/2017Q3'
ref is 'origin/2017Q4'
ref is 'origin/2018Q1'
ref is 'origin/2018Q2'
ref is 'origin/2018Q3'
ref is 'origin/2018Q4'
ref is 'origin/2019Q1'
ref is 'origin/2019Q2'
ref is 'origin/2019Q3'
ref is 'origin/2019Q4'
ref is 'origin/2020Q1'
ref is 'origin/2020Q2'
ref is 'origin/2020Q3'
ref is 'origin/2020Q4'
ref is 'origin/2021Q1'
ref is 'origin/2021Q2'
ref is 'origin/HEAD'
ref is 'origin/main'
[dan@mydev:~/src/repos/ports] $ 

This. This is more like it.

Getting the starting point

First, some background on the pseudo code.

At present, FreshPorts stores the ‘starting’ point, also known as the last commit, in a file: latest.ports.

Mathieu’s approach uses a tag in the repo itself. I was hesitant at first. If we lose the repo, we lose our starting points.

Line 18 from above has: if ! git tag -l freshports/$refname

That’s looking for the git tag. If we don’t find it, we know we have not processed this branch before. The typical use case here is a new quarterly branch has arrived and we want to process it. The first step of that processing is identifying the first commit on the branch.

Line 22 contains: git merge-base $NAME_OF_REMOTE/$NAME_OF_HEAD $refname)

NAME_OF_REMOTE and NAME_OF_HEAD confused me for a while. madree to the rescue.

[dan@mydev:~/src/repos/ports] $ git remote -v
origin	https://git.FreeBSD.org/ports.git (fetch)
origin	https://git.FreeBSD.org/ports.git (push)

[dan@mydev:~/src/repos/ports] $ git remote show
origin

There. NAME_OF_REMOTE is origin.

NAME_OF_HEAD is main. I can see that here:

[dan@mydev:~/src/repos/ports] $ git branch
* main
[dan@mydev:~/src/repos/ports] $ 

For this example, let’s use refname origin/2021Q2:

[dan@mydev:~/src/repos/ports] $ git merge-base origin/main origin/2021Q2
4e3cf0163c4a00d4dac41d6da43472d2fcab2f29

Looking at that commit:

author	Yuri Victorovich <yuri@FreeBSD.org>	2021-04-06 21:01:18 +0000
www/yt-dlp: Update 2021.03.24.1 -> 2021.04.03
PR:		254782
Submitted by:	daniel.engberg.lists@pyret.net

Is that really the first commit on the 2021Q2 branch?

I think it is not. But let’s continue with this exercise. Let’s see the first four commits.

[dan@mydev:~/src/repos/ports] $ git rev-list 4e3cf0163c4a00d4dac41d6da43472d2fcab2f29..origin/2021Q2 | head -4
5ceea227c504d2892d91c1aa8d8d81ff15b22fc3
3ce47d16f7eb5c00b470603c307fa52bb9ca920b
700466498a3c9c550882b91c9e9efec2ac533346
ce9f001bf8e1d5ef297e1a869cfb97f75f750c71
[dan@mydev:~/src/repos/ports] $ 

These are the links to those commits:

  1. 5ceea227c504d2892d91c1aa8d8d81ff15b22fc3
  2. 3ce47d16f7eb5c00b470603c307fa52bb9ca920b
  3. 700466498a3c9c550882b91c9e9efec2ac533346
  4. ce9f001bf8e1d5ef297e1a869cfb97f75f750c71

It is difficult for me to confirm whether or a commit is on a given branch. It’s not trivial.

Let’s try this for now, using the code from git_proc_commit

$ cd ~/src/freshports/git_proc_commit/git-to-freshports
$ ./git-to-freshports-xml.py --repo ports --path ~/src/repos/ports \
--commit-range 4e3cf0163c4a00d4dac41d6da43472d2fcab2f29..origin/2021Q2 \
--spooling ~/src/msgs/tmp --output ~/src/msgs/xml
$ ls ~/src/msgs/xml/ | wc -l
     590
$ grep -l 'Branch="2021Q2"' * | wc -l
     590

Those 590 files are now in the above repo as 2021Q2 examples. Everyone one of them contains Branch=”2021Q2″.

Future work

Future works means deleting those commits after fixing this issue:

Observer has noticed that commit 'eca26f0cd09d10ab7becf712579a1470768d966c' contains file /ports/2021Q2/sysutils/cbsd/distinfo as revision eca26f0cd09d10ab7becf712579a1470768d966c in repository ports
STARTING _CompileListOfPorts ................................
for a commit on 'branch': '2021Q2'
this commit is NOT ON head
FILE ==: Modify, /ports/2021Q2/sysutils/cbsd/Makefile, eca26f0cd09d10ab7becf712579a1470768d966c, ports, cbsd, Makefile, 4431078
YES, this file is in the ports tree
... but this file is not part of a physical category on disk!

Problem is, that is a valid physical category. On a branch. Adjustments are required.

In addition, that’s the wrong path. It should be /ports/branches/2021Q2 not /ports/2021Q2.

Fixed

The path thing is fixed. It was an issue on incoming commits, now repaired with:

 	my $NewRevision		= 0;
 	my $element;
 	my $element_id;
+	my $filename;
 	# This is where we add in the repo name to the path
-	my $filename     = $DB_Root_Prefix . '/' . $Updates{branch_for_files} . '/' . $FilePath;
+	# At one time, I think, $Updates{branch_for_files} was to be either head or branches/2021Q2 (or example).
+	# As of 2021.06.20, it is either head or 2021Q2 (no branches).
+	# I think the best thing to do is to check $Updates{branch_for_files} here and add in braches when required.
+	if ($Updates{branch_for_files} eq $FreshPorts::Constants::HEAD) {
+		$filename = $DB_Root_Prefix . '/' .          $Updates{branch_for_files} . '/' . $FilePath;
+	} else {
+		$filename = $DB_Root_Prefix . '/branches/' . $Updates{branch_for_files} . '/' . $FilePath;
+	}	

The 560 commits from 2021Q2 are now being imported.

This blog post has been updated throughout the day. It is now finished. Next, I’ll write up what’s next.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top