Transitioning from subversion to git was both technically and personally challenging. During the COVID-19 pandemic there was a lot of work to carry out. A new server was created on AWS with a new layout and structure. For a few months it was nothing but my day job and FreshPorts in all my non-working hours.
This post was followed by Processing commits on branches with git – part 2.
The new behind-the-scenes code works better than the subversion code, mostly because it no longer relies upon email for notifications of new commits to the code.
One aspect of commit processing not yet tackled is commits on branches.
Back in November 2020, Mathieu Arnold wrote about a proposed approach. This is what I’m going to start on today.
From what I understood, freshports never needs to be able to access
files on the top of a branch, it only needs to access the files on
specific commits (the fact that a commit is at the top of the branch is
only an artefact of the process). So, you never need to checkout any
branches, you only need to checkout commits. So you never have a HEAD
that points to a branch, HEAD is always in a detached state and points
to a commit.So what you need to do is, git clone the repository, and then, each time
the script runs, do (in somewhat shell script):NAME_OF_REMOTE=origin NAME_OF_HEAD=main cd /where/the/repo/is git fetch # get all references (so, branches, tags, and so one) and keep only # branches (named commits here) that the remote repository has git for-each-ref --format '%(objecttype) %(refname)' \ | sed -n 's/^commit refs\/remotes\///p' | while read -r type refname do # If we don't have the tag, it means we have not encountered the # branch yet. if ! git tag -l freshports/$refname then # get the first commit of that branch and create a tag. git tag -m "first known commit of $refname" -f freshports/$refname $(git merge-base $NAME_OF_REMOTE/$NAME_OF_HEAD $refname) fi # Get the list of commits between the last known one and the tip of # the branch, list may be empty. git rev-list freshports/$refname..$refname | while read commithash do # checkout that commit (with -f so that if some file got changed, we # overwrite everything git checkout -f $commithash # process the commit done # Store the last known commit that we just processed. git tag -m "last known commit of $refname" -f freshports/$refname $refname doneThat’s all, you never need to merge or pull or whatever else.
My starting point, the existing git-delta.sh script, which obtains a list of commits which have occurred after a given commit.
Get the repo
Here, I get the ports repo.
mkdir -p ~/src/repos git clone https://git.FreeBSD.org/ports.git cd ports
First attempt
[dan@mydev:~/src/repos/ports] $ git for-each-ref --format '%(objecttype) %(refname)' commit refs/heads/2021Q2 commit refs/heads/main commit refs/remotes/origin/2014Q1 commit refs/remotes/origin/2014Q2 commit refs/remotes/origin/2014Q3 commit refs/remotes/origin/2014Q4 commit refs/remotes/origin/2015Q1 commit refs/remotes/origin/2015Q2 commit refs/remotes/origin/2015Q3 commit refs/remotes/origin/2015Q4 commit refs/remotes/origin/2016Q1 commit refs/remotes/origin/2016Q2 commit refs/remotes/origin/2016Q3 commit refs/remotes/origin/2016Q4 commit refs/remotes/origin/2017Q1 commit refs/remotes/origin/2017Q2 commit refs/remotes/origin/2017Q3 commit refs/remotes/origin/2017Q4 commit refs/remotes/origin/2018Q1 commit refs/remotes/origin/2018Q2 commit refs/remotes/origin/2018Q3 commit refs/remotes/origin/2018Q4 commit refs/remotes/origin/2019Q1 commit refs/remotes/origin/2019Q2 commit refs/remotes/origin/2019Q3 commit refs/remotes/origin/2019Q4 commit refs/remotes/origin/2020Q1 commit refs/remotes/origin/2020Q2 commit refs/remotes/origin/2020Q3 commit refs/remotes/origin/2020Q4 commit refs/remotes/origin/2021Q1 commit refs/remotes/origin/2021Q2 commit refs/remotes/origin/HEAD commit refs/remotes/origin/main tag refs/tags/10-eol tag refs/tags/4-eol tag refs/tags/5-eol tag refs/tags/6-eol tag refs/tags/7-eol tag refs/tags/8-eol tag refs/tags/9-eol tag refs/tags/pkg-install-eol tag refs/tags/pre-xorg-7 tag refs/tags/release/10.0.0 tag refs/tags/release/10.1.0 tag refs/tags/release/10.2.0 tag refs/tags/release/10.3.0 tag refs/tags/release/10.4.0 tag refs/tags/release/11.0.0 tag refs/tags/release/11.1.0 tag refs/tags/release/11.2.0 tag refs/tags/release/11.3.0 tag refs/tags/release/11.4.0 tag refs/tags/release/12.0.0 tag refs/tags/release/12.1.0 tag refs/tags/release/12.2.0 tag refs/tags/release/13.0.0 tag refs/tags/release/2.0.5 tag refs/tags/release/2.0.5a tag refs/tags/release/2.1.0 tag refs/tags/release/2.1.5 tag refs/tags/release/2.1.6 tag refs/tags/release/2.1.7 tag refs/tags/release/2.2.0 tag refs/tags/release/2.2.1 tag refs/tags/release/2.2.2 tag refs/tags/release/2.2.5 tag refs/tags/release/2.2.6 tag refs/tags/release/2.2.7 tag refs/tags/release/2.2.8 tag refs/tags/release/3.0.0 tag refs/tags/release/3.1.0 tag refs/tags/release/3.2.0 tag refs/tags/release/3.3.0 tag refs/tags/release/3.4.0 tag refs/tags/release/3.5.0 tag refs/tags/release/4.0.0 tag refs/tags/release/4.1.0 tag refs/tags/release/4.1.1 tag refs/tags/release/4.10.0 tag refs/tags/release/4.11.0 tag refs/tags/release/4.2.0 tag refs/tags/release/4.3.0 tag refs/tags/release/4.4.0 tag refs/tags/release/4.5.0 tag refs/tags/release/4.6.0 tag refs/tags/release/4.6.1 tag refs/tags/release/4.6.2 tag refs/tags/release/4.7.0 tag refs/tags/release/4.8.0 tag refs/tags/release/4.9.0 tag refs/tags/release/5.0.0 tag refs/tags/release/5.1.0 tag refs/tags/release/5.2.0 tag refs/tags/release/5.2.1 tag refs/tags/release/5.3.0 tag refs/tags/release/5.4.0 tag refs/tags/release/5.5.0 tag refs/tags/release/6.0.0 tag refs/tags/release/6.1.0 tag refs/tags/release/6.2.0 tag refs/tags/release/6.3.0 tag refs/tags/release/6.4.0 tag refs/tags/release/7.0.0 tag refs/tags/release/7.1.0 tag refs/tags/release/7.2.0 tag refs/tags/release/7.3.0 tag refs/tags/release/7.4.0 tag refs/tags/release/8.0.0 tag refs/tags/release/8.1.0 tag refs/tags/release/8.2.0 tag refs/tags/release/8.3.0 tag refs/tags/release/8.4.0 tag refs/tags/release/9.0.0 tag refs/tags/release/9.1.0 tag refs/tags/release/9.2.0 tag refs/tags/release/9.3.0 [dan@mydev:~/src/repos/ports] $ Raw
Applying the filter
My first try brought back empty strings:
[dan@mydev:~/src/repos/ports] $ git for-each-ref --format '%(objecttype) %(refname)' \ | sed -n 's/^commit refs\/remotes\///p' \ | while read -r type refname do echo ref is "'$refname'" done ref is '' ref is '' ref is '' ref is '' ...
madree explained this to me as: “the sed expression comes up with just freebsd/HEAD and freebsd/main for me and then read can’t possibly read two columns (type and refname). try omitting the “type” word and use just “while read -r refname” and see what you get”.
And so I did:
[dan@mydev:~/src/repos/ports] $ git for-each-ref --format '%(objecttype) %(refname)' | sed -n 's/^commit refs\/remotes\///p' | while read -r refname; do echo ref is "'$refname'"; done ref is 'origin/2014Q1' ref is 'origin/2014Q2' ref is 'origin/2014Q3' ref is 'origin/2014Q4' ref is 'origin/2015Q1' ref is 'origin/2015Q2' ref is 'origin/2015Q3' ref is 'origin/2015Q4' ref is 'origin/2016Q1' ref is 'origin/2016Q2' ref is 'origin/2016Q3' ref is 'origin/2016Q4' ref is 'origin/2017Q1' ref is 'origin/2017Q2' ref is 'origin/2017Q3' ref is 'origin/2017Q4' ref is 'origin/2018Q1' ref is 'origin/2018Q2' ref is 'origin/2018Q3' ref is 'origin/2018Q4' ref is 'origin/2019Q1' ref is 'origin/2019Q2' ref is 'origin/2019Q3' ref is 'origin/2019Q4' ref is 'origin/2020Q1' ref is 'origin/2020Q2' ref is 'origin/2020Q3' ref is 'origin/2020Q4' ref is 'origin/2021Q1' ref is 'origin/2021Q2' ref is 'origin/HEAD' ref is 'origin/main' [dan@mydev:~/src/repos/ports] $
This. This is more like it.
Getting the starting point
First, some background on the pseudo code.
At present, FreshPorts stores the ‘starting’ point, also known as the last commit, in a file: latest.ports.
Mathieu’s approach uses a tag in the repo itself. I was hesitant at first. If we lose the repo, we lose our starting points.
Line 18 from above has: if ! git tag -l freshports/$refname
That’s looking for the git tag. If we don’t find it, we know we have not processed this branch before. The typical use case here is a new quarterly branch has arrived and we want to process it. The first step of that processing is identifying the first commit on the branch.
Line 22 contains: git merge-base $NAME_OF_REMOTE/$NAME_OF_HEAD $refname)
NAME_OF_REMOTE and NAME_OF_HEAD confused me for a while. madree to the rescue.
[dan@mydev:~/src/repos/ports] $ git remote -v origin https://git.FreeBSD.org/ports.git (fetch) origin https://git.FreeBSD.org/ports.git (push) [dan@mydev:~/src/repos/ports] $ git remote show origin
There. NAME_OF_REMOTE is origin.
NAME_OF_HEAD is main. I can see that here:
[dan@mydev:~/src/repos/ports] $ git branch * main [dan@mydev:~/src/repos/ports] $
For this example, let’s use refname origin/2021Q2:
[dan@mydev:~/src/repos/ports] $ git merge-base origin/main origin/2021Q2 4e3cf0163c4a00d4dac41d6da43472d2fcab2f29
Looking at that commit:
author Yuri Victorovich <yuri@FreeBSD.org> 2021-04-06 21:01:18 +0000 www/yt-dlp: Update 2021.03.24.1 -> 2021.04.03 PR: 254782 Submitted by: daniel.engberg.lists@pyret.net
Is that really the first commit on the 2021Q2 branch?
I think it is not. But let’s continue with this exercise. Let’s see the first four commits.
[dan@mydev:~/src/repos/ports] $ git rev-list 4e3cf0163c4a00d4dac41d6da43472d2fcab2f29..origin/2021Q2 | head -4 5ceea227c504d2892d91c1aa8d8d81ff15b22fc3 3ce47d16f7eb5c00b470603c307fa52bb9ca920b 700466498a3c9c550882b91c9e9efec2ac533346 ce9f001bf8e1d5ef297e1a869cfb97f75f750c71 [dan@mydev:~/src/repos/ports] $
These are the links to those commits:
- 5ceea227c504d2892d91c1aa8d8d81ff15b22fc3
- 3ce47d16f7eb5c00b470603c307fa52bb9ca920b
- 700466498a3c9c550882b91c9e9efec2ac533346
- ce9f001bf8e1d5ef297e1a869cfb97f75f750c71
It is difficult for me to confirm whether or a commit is on a given branch. It’s not trivial.
Let’s try this for now, using the code from git_proc_commit
$ cd ~/src/freshports/git_proc_commit/git-to-freshports $ ./git-to-freshports-xml.py --repo ports --path ~/src/repos/ports \ --commit-range 4e3cf0163c4a00d4dac41d6da43472d2fcab2f29..origin/2021Q2 \ --spooling ~/src/msgs/tmp --output ~/src/msgs/xml $ ls ~/src/msgs/xml/ | wc -l 590 $ grep -l 'Branch="2021Q2"' * | wc -l 590
Those 590 files are now in the above repo as 2021Q2 examples. Everyone one of them contains Branch=”2021Q2″.
Future work
Future works means deleting those commits after fixing this issue:
Observer has noticed that commit 'eca26f0cd09d10ab7becf712579a1470768d966c' contains file /ports/2021Q2/sysutils/cbsd/distinfo as revision eca26f0cd09d10ab7becf712579a1470768d966c in repository ports STARTING _CompileListOfPorts ................................ for a commit on 'branch': '2021Q2' this commit is NOT ON head FILE ==: Modify, /ports/2021Q2/sysutils/cbsd/Makefile, eca26f0cd09d10ab7becf712579a1470768d966c, ports, cbsd, Makefile, 4431078 YES, this file is in the ports tree ... but this file is not part of a physical category on disk!
Problem is, that is a valid physical category. On a branch. Adjustments are required.
In addition, that’s the wrong path. It should be /ports/branches/2021Q2 not /ports/2021Q2.
Fixed
The path thing is fixed. It was an issue on incoming commits, now repaired with:
my $NewRevision = 0; my $element; my $element_id; + my $filename; # This is where we add in the repo name to the path - my $filename = $DB_Root_Prefix . '/' . $Updates{branch_for_files} . '/' . $FilePath; + # At one time, I think, $Updates{branch_for_files} was to be either head or branches/2021Q2 (or example). + # As of 2021.06.20, it is either head or 2021Q2 (no branches). + # I think the best thing to do is to check $Updates{branch_for_files} here and add in braches when required. + if ($Updates{branch_for_files} eq $FreshPorts::Constants::HEAD) { + $filename = $DB_Root_Prefix . '/' . $Updates{branch_for_files} . '/' . $FilePath; + } else { + $filename = $DB_Root_Prefix . '/branches/' . $Updates{branch_for_files} . '/' . $FilePath; + }
The 560 commits from 2021Q2 are now being imported.
This blog post has been updated throughout the day. It is now finished. Next, I’ll write up what’s next.