Generating XML for a single commit

This post is a how-to and reminder for myself. I’m working on finding a better way for FreshPorts to know a new physical ports category when it finds it. I think the only way is to look at the directories in the FreeBSD ports repo.

A discussion on IRC led to this shell script:

[dan@devgit-ingress01:/var/db/ingress/repos/ports] $ find -s -f * -type d -regex '[a-z].*' -maxdepth 0 | xargs | wc -w
      62

I will break that down for future reference:

  1. -s : traverse the file hierarchies in lexicographical; it gives us the output in sorted order
  2. -f * : avoids nasty situations where there might be a file named -foo in the tree. It also avoids use of find . which would give results with a leading ./ in the filenames
  3. -type d – only directories
  4. -regex ‘[a-z].*’ – match only items which start with a lower case character (i.e. ignore Mk, LEGAL, COPYRIGHT, .hooks
  5. -maxdepth 0 – stay in the top level directory
  6. | xargs – pipe the results onto one line
  7. | wc -w – count the number of words returned; included here only for testing purposes

The need for this script arose when FreshPorts encountered a commit for a .dotfile. While this wasn’t the first commit of such a file, it did bring my attention to the need to find a list of categories. The commit in question added the .hooks/prepare-commit-msg file to the ports tree. The code incorrectly concluded this was a new port (i.e. prepare-commit-msg). FreshPorts kept looking for a Makefile for this port, and never found it. It kept looking, by waiting, and trying again. The code could have seen, by looking in the repo, that this was a file, not a directory, and no Makefile was coming.

Instead, I’m taking a different approach. With each commit, the code will query the ports repo and get a list of the current categories. This list will be referenced and the code will know whether a given file affects a port or not. As I type this, that code is not yet written, but it will get written as I go through this process.

Part of the debug log

FreshPorts logs. A lot. The commit in question has 1307 lines of output:

[dan@devgit-ingress01:/var/db/freshports/message-queues/retry] $ wc -l 2021.04.20.09.58.35.000000.bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248.log
    1307 2021.04.20.09.58.35.000000.bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248.log

The file name is taken from the date/time that the file commit was processed by FreshPorts, in this case, 2021-04-20 at 09:58:35 (all times are UTC). The 000000 is a counter, so this was the first commit found during that processing time. We have room for 1,000,000 commits at a time before that counter rolls over. Next is the commit hash, which should be unique enough, but we still prefix the file name with that timestamp.

On line 115 of that log file, we find:

getting id from 'commit_log_elements_id_seq'
sql is insert into commit_log_elements(id, commit_log_id, element_id, revision_name, change_type) values 
                                        (4388001, 849545, 1215757, 'bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248', 'A')
sql = 'select ElementTagSet(1, 1215757, 'bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248')'
pushing the following onto @Files
FileAction='Add'
FilePath='/ports/head/.hooks/prepare-commit-msg'
FileRevision='bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248'
commit_log_element->{id}='4388001'
element_id='1215757'
Observer has noticed that commit 'bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248' contains file /ports/head/.hooks/prepare-commit-msg as revision bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248 in repos
STARTING _CompileListOfPorts ................................
for a commit on 'branch': 'head'
this commit is on head
FILE ==: Add, /ports/head/.hooks/prepare-commit-msg, bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248, ports, .hooks, prepare-commit-msg, 4388001
YES, this file is in the ports tree
checking for category='.hooks'
sql = "select * from categories where name = '.hooks'"
NOT FOUND
creating new category .hooks

Line 130 correctly determines that this file is in the ports tree.

Line 131 is where things start going wrong. This is not a category.

Ignored items

Way back when, a list of ignored items was created:

#
# These are the entries within /usr/ports/ which we ignore
# and /usr/ports/<category> which FreshPorts does not track
#
%FreshPorts::Constants::IgnoredItems = (
        "Attic"        => 1,
        "distfiles"    => 2,
        "Mk"           => 3,
        "Tools"        => 4,
        "Templates"    => 5,
        "Makefile"     => 6,
        "Makefile.inc" => 7,
        "CVSROOT"      => 8,
        "base"         => 9,
);

This was good enough for the early days. The list of IgnoredItems dates back to Fri Nov 9 16:30:29 2001 when it was a single string:

$FreshPorts::Constants::IgnoredItems = "Attic|distfiles|Mk|Tools|Templates|Makefile|pkg";

Today, I’m removing that and introducing a new module: FreshPorts::Catgories.

Getting the XML

Getting back to the original purpose of this post, creating XML.

Based on git commit processing – how is it done?, I found a reference to git-to-freshports-xml.py.

I tried this:

 $ ./git-to-freshports-xml.py
usage: git-to-freshports-xml.py [-h] -p PATH -O OUTPUT -S SPOOLING -r REPO [-o OS] [-f] [-v] [-l {syslog,stderr}] (-c COMMIT | -s SINGLE_COMMIT | -R COMMIT_RANGE)
git-to-freshports-xml.py: error: the following arguments are required: -p/--path, -O/--output, -S/--spooling, -r/--repo

Looking in the logs, I found this:

[dan@devgit-ingress01:~/scripts] $ sudo grep git-to-freshports-xml.py /var/log/freshports/git.log | head -1
2021.05.09 00:03:14 git-delta.sh /usr/local/libexec/freshports/git-to-freshports-xml.py --repo doc --path /var/db/ingress/repos/doc --commit cf1bad339628407060c88a1b20218f0c9660ba11 --spooling /var/db/ingress/message-queues/spooling --output /var/db/ingress/message-queues/incoming

Now I have something I can use to create the XML.

But if you run that, like I did, you’ll get thousands of XML files. I need to use -s instead of –commit.

This worked:

[dan@devgit-ingress01:~/scripts] $ /usr/local/libexec/freshports/git-to-freshports-xml.py --repo ports --path /var/db/ingress/repos/ports -s  bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248 --spooling /tmp/ --output ~/tmp

And created:

[dan@devgit-ingress01:~/tmp] $ cat 2021.04.20.09.58.35.000000.bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248.xml 
<?xml version='1.0' encoding='UTF-8'?>
<UPDATES Version="1.5.0.0" Source="git">
  <UPDATE>
    <DATE Year="2021" Month="4" Day="20"/>
    <TIME Timezone="UTC" Hour="9" Minute="58" Second="35"/>
    <OS Repo="ports" Id="FreeBSD" Branch="main"/>
    <LOG>Add the prepare-commit-msg hook to the repository.

To make use of it, the easiest way is to run:

  git config --add core.hooksPath .hooks

Discussed with:	bapt</LOG>
    <PEOPLE>
      <COMMITTER CommitterName="Mathieu Arnold" CommitterEmail="mat@FreeBSD.org"/>
      <AUTHOR AuthorName="Mathieu Arnold" AuthorEmail="mat@FreeBSD.org"/>
    </PEOPLE>
    <COMMIT Hash="bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248" HashShort="bbc2474" Subject="Add the prepare-commit-msg hook to the repository." EncodingLoses="false" Repository="ports"/>
    <FILES>
      <FILE Action="Add" Path=".hooks/prepare-commit-msg"/>
    </FILES>
  </UPDATE>
</UPDATES>
[dan@devgit-ingress01:~/tmp] $

The code change

The code change looks like this:

-                       # look for special files outside a port, such as LEGAL, GIDs, UIDs
-                       if ($subtree eq $FreshPorts::Config::ports_prefix && defined($FreshPorts::Constants::IgnoredItems{$category_name})) {
+                       # look for special files outside a port, such as LEGAL, GIDs, UIDs, .hooks
+                       if ( ! any {/$category_name/} @FreshPorts::Categories::categories ) {

That any function comes in via use List::MoreUtils ‘any’; and is part of lang/p5-List-MoreUtils, which is already a dependency on all FreshPorts hosts.

Reprocessing the errant commit

When I reprocessed that .hooks commit, I got this:

getting id from 'commit_log_elements_id_seq'
sql is insert into commit_log_elements(id, commit_log_id, element_id, revision_name, change_type) values 
                                        (4397803, 852396, 1217151, 'bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248', 'A')
sql = 'select ElementTagSet(1, 1217151, 'bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248')'
pushing the following onto @Files
FileAction='Add'
FilePath='/ports/head/.hooks/prepare-commit-msg'
FileRevision='bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248'
commit_log_element->{id}='4397803'
element_id='1217151'
Observer has noticed that commit 'bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248' contains file /ports/head/.hooks/prepare-commit-msg as revision bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248 in repos
STARTING _CompileListOfPorts ................................
for a commit on 'branch': 'head'
this commit is on head
FILE ==: Add, /ports/head/.hooks/prepare-commit-msg, bbc2474ef7a65eb8561c8ecf7af80c2bfed1f248, ports, .hooks, prepare-commit-msg, 4397803
YES, this file is in the ports tree
... but is not a file in a category on disk!

ENDING _CompileListOfPorts ................................

The code correctly determines that this is not a port, because .hooks is not in the list of categories.

Win. Thank you for coming to my TED talk.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top