Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Jan 262020
 

FreshPorts had a cache clearing issue recently. It lingered. For a long time. It took me a while to figure it out. It turned out to be a Python 3 compatibility issue.

In this post:

  • PostgreSQL 12.1
  • FreeBSD 12.1

I will outline how database changes invoke external scripts asynchronously which then clear the cache. I will go into the debugging of the issue and how the bug was fixed.

How caching works

FreshPorts caches data on disk. When you go to visit a page, the website checks the cache on disk, and if not found, it runs the queries, creates the HTML then saves it to disk.

Really? You cache to disk? Why?

Q: Why don’t you using memcached or some other caching tool?

A: FreshPorts was created long before such fancy tools.

The database side of caching

FreshPorts uses PostgreSQL, an amazing database which just happens to be open source and freely available. The initial work was done on a lesser database system, but then I was introduced to PostgreSQL in July 2000, which had grown-up features I expected, such as stored procedures and relational integrity. I never looked back.

Cache clearing in FreshPorts relies upon asynchronous notification using the LISTEN and NOTIFY commands.

A small python script (about 220 lines) issues several LISTEN commands, then starts polling.

This is the connection:

import configparser

config = configparser.ConfigParser()
config.read('/usr/local/etc/freshports/fp-listen.ini')

DSN = 'host=' + config['database']['HOST'] + ' dbname=' + config['database']['DBNAME'] + ' user=' + config['database']['DBUSER'] + ' password=' + config['database']['PASSWORD']

conn = psycopg2.connect(DSN)
curs = conn.cursor()

Next, we read from the database a list of LISTEN channels and then issue a LISTEN command for that channel

curs.execute("SELECT name, script_name FROM listen_for ORDER BY id")
listens_for = curs.fetchall()

listens = dict()
print ("These are the (event name, script name) pairs we are ready for:")
for listen in listens_for:
  curs.execute("LISTEN %s" % listen[0])
  listens[listen[0]] = listen[1]
  print ("('%s', '%s')" % (listen[0], listen[1]))

In this case, name is a channel and script_name is what the Python script will run when it encounters a NOTIFY for that channel.

This is the main loop of the code:

while 1:
  if select.select([conn],[],[],5)==([],[],[]):
    syslog.syslog(syslog.LOG_NOTICE, 'timeout! *************')
  else:
    conn.poll()
    syslog.syslog(syslog.LOG_NOTICE, 'Just woke up! *************')
    while conn.notifies:
      notify = conn.notifies.pop(0);
      # in real life, do something with each...
      syslog.syslog(syslog.LOG_NOTICE, "Got NOTIFY: %d, %s, %s" % (notify.pid, notify.channel, notify.payload));
      if notify.channel in listens:
        syslog.syslog(syslog.LOG_NOTICE, "found key %s" % (notify.channel));
        clear_cache = True;
        if listens[notify.channel]   == 'listen_port':
          syslog.syslog(syslog.LOG_NOTICE, "invoking RemoveCacheEntry()");
          RemoveCacheEntry()
        elif listens[notify.channel] == 'listen_date_updated':
          syslog.syslog(syslog.LOG_NOTICE, "invoking ClearDateCacheEntries()");
          ClearDateCacheEntries()
        ...
        ...
        else:
          clear_cache = False;
          syslog.syslog(syslog.LOG_ERR, "Code does not know what to do when '%s' is found." % notify.channel)

        if clear_cache:
          ClearMiscCaches()

      else:
        syslog.syslog(syslog.LOG_NOTICE, 'no such key in listens array for %s!' % (notify.channel))

This isn’t great code, patches & suggestions welcomed.

I remember this worked differently in earlier versions of PostgreSQL, but this approach has been in use for several years now. The poll command times out occasionally, and when it does, we just loop again.

When we receive a NOTIFY, we verify it is in the list of what we LISTEN for, then go through the list and invoke the appropriate script.

The database side

Triggers will invoke a NOTIFY when and where relevant. An update on the commit_log table will add an entry to the cache_clearing_dates table and invoke NOTIFY date_updated.

How cache clearing works

With the database issuing the NOTIFY, the script receives the NOTIFY date_updated. It will then query the cache_clearing_dates table to see what dates need to be cleared. On disk this looks like this:

[dan@dev-nginx01:/var/service/fp-listen/fp-listen] $ sudo ls -l /var/db/freshports/cache/daily/2020/01
total 340
-rw-rw-r--  1 www  freshports  143080 Jan  9 23:26 08.daily.head
-rw-rw-r--  1 www  freshports  107017 Jan 11 04:57 10.daily.head
-rw-rw-r--  1 www  freshports  152478 Jan 21 14:26 17.daily.head
-rw-rw-r--  1 www  freshports  129303 Jan 25 18:12 23.daily.head

Thus, the code just deletes the given file pertaining to the date in question.

Finding the issue

Debugging the caching was challenging in that the first step was to reproduce the issue.

I started the small Python app on my dev FreshPorts jail:

[root@dev-nginx01:/var/service/fp-listen] # echo ./fp-listen/fp-listen.py | su -fm freshports
These are the (event name, script name) pairs we are ready for:
('port_updated', 'listen_port')
('ports_moved', 'listen_ports_moved')
('ports_updating', 'listen_ports_updating')
('vuxml', 'listen_vuxml')
('category_new', 'listen_category_new')
('date_updated', 'listen_date_updated')

The output you see is what the app has read from the listen_for table. The events are data-driven. See lines 14-18 above.

freshports.dev=# SELECT * FROM listen_for;
 id |      name      |      script_name      
----+----------------+-----------------------
  5 | port_updated   | listen_port
  6 | ports_moved    | listen_ports_moved
  7 | ports_updating | listen_ports_updating
  8 | vuxml          | listen_vuxml
 10 | category_new   | listen_category_new
 11 | date_updated   | listen_date_updated
(6 rows)

freshports.dev=# 

Now that the code is running, in a separate ssh session, I connect to the database and manually issue a NOTIFY command:

freshports.dev=# NOTIFY date_updated;
NOTIFY
freshports.dev=# 

BOOM, the Python application dies in a heap:

Traceback (most recent call last):
  File "./fp-listen/fp-listen.py", line 190, in 
    if listens.has_key(notify.channel):
AttributeError: 'dict' object has no attribute 'has_key'
[root@dev-nginx01:/var/service/fp-listen] #

NOTE:

  1. the code you see above has already been patched
  2. line 190 in the code is actually line 29 as shown above

The fix

The fix was:

-      if listens.has_key(notify.channel):
+      if notify.channel in listens:

This fix is seen online 8 above and is a known and documented Python 2 – > Python 3 migration issue

Look at all the backed up queues!

When fp-listen gets a NOTIFY for ports or dates which need clearing, that data can contain a payload, but this must be shorter than 8000 bytes. As mentioned above, we store this data in tables instead of sending it directly to the Python app.

Looking in the tables:

[dan@x8dtu-pg02:~] $ ': psql freshports.org
Password for user dan: 
psql (12.1)
Type "help" for help.

freshports.org=# select count(*) from cache_clearing_ports;
 count 
-------
 20508
(1 row)

freshports.org=# select count(*) from cache_clearing_dates;
 count 
-------
   290
(1 row)

freshports.org=# 

So how long has this been an issue:

freshports.org=# select * from cache_clearing_dates order by date_added asc limit 5;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 172300 | 2020-01-22    | 00:08:21.443837
 172301 | 2020-01-22    | 00:12:37.328254
 172302 | 2020-01-22    | 00:30:32.244191
 172303 | 2020-01-22    | 00:41:27.386203
 172304 | 2020-01-22    | 00:43:57.517707
(5 rows)

freshports.org=# select * from cache_clearing_dates order by date_added desc limit 5;
   id   | date_to_clear |   date_added    
--------+---------------+-----------------
 172299 | 2020-01-21    | 23:58:53.74748
 172298 | 2020-01-21    | 23:49:24.205331
 172188 | 2020-01-20    | 23:20:03.807726
 172297 | 2020-01-21    | 23:16:04.518664
 172187 | 2020-01-20    | 23:06:55.273125
(5 rows)

freshports.org=# 

Oh, it seems the cache_clearing_dates.date_added be a timestamp. I’ll fix that issue later.

Let’s try the ports cache queue:

freshports.org=# select * from cache_clearing_ports order by date_added asc limit 5;
    id    | port_id | category |     port      |         date_added         
----------+---------+----------+---------------+----------------------------
 16669514 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
 16669515 |   16343 | devel    | py-setuptools | 2020-01-20 17:23:04.747768
 16669516 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
 16669517 |   49106 | lang     | python37      | 2020-01-20 17:23:04.747768
 16669513 |   56394 | graphics | py-pygeoapi   | 2020-01-20 17:23:04.747768
(5 rows)

freshports.org=# select * from cache_clearing_ports order by date_added desc limit 5;
    id    | port_id | category |       port       |         date_added         
----------+---------+----------+------------------+----------------------------
 16691617 |   37057 | print    | R-cran-knitr     | 2020-01-22 15:37:13.407464
 16691618 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691615 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691616 |   37124 | textproc | R-cran-rmarkdown | 2020-01-22 15:37:13.407464
 16691619 |   37051 | textproc | R-cran-yaml      | 2020-01-22 15:37:13.407464
(5 rows)

freshports.org=# 

From this, the problem arose on January 20th.

FYI, the tables look like this:

freshports.org=# \d cache_clearing_ports 
                                        Table "public.cache_clearing_ports"
   Column   |            Type             | Collation | Nullable |                     Default                      
------------+-----------------------------+-----------+----------+--------------------------------------------------
 id         | integer                     |           | not null | nextval('cache_clearing_ports_id_seq'::regclass)
 port_id    | integer                     |           | not null | 
 category   | text                        |           | not null | 
 port       | text                        |           | not null | 
 date_added | timestamp without time zone |           | not null | now()
Indexes:
    "cache_clearing_ports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# \d cache_clearing_dates
                                       Table "public.cache_clearing_dates"
    Column     |          Type          | Collation | Nullable |                     Default                      
---------------+------------------------+-----------+----------+--------------------------------------------------
 id            | integer                |           | not null | nextval('cache_clearing_dates_id_seq'::regclass)
 date_to_clear | date                   |           |          | 
 date_added    | time without time zone |           |          | now()

freshports.org=# 

Releasing the code

After I fixed the code, packaged it, and installed it on test, I encountered this error:

Jan 22 17:28:32 test-nginx01 fp-listen[62850]: ClearMiscCaches() is clearing '/var/db/freshports/cache/new/*'

Eh? That’s the wrong directory. It should be news not new. Here’s the code in question

def ClearMiscCaches():
  syslog.syslog(syslog.LOG_NOTICE, 'invoked: ClearMiscCaches()');
  filenameglob = config['dirs']['NEWS_CACHE_DIR'];
  syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is clearing %s' % (filenameglob));
  for filename in glob.glob(filenameglob):
    syslog.syslog(syslog.LOG_NOTICE, 'removing %s' % (filename))
    try:
      if os.path.isfile(filename):
        os.remove(filename)
      else:
        shutil.rmtree(filename)

    except (OSError, err):
      if err[0] == 2:
        pass  # no file to delete, so no worries

      else:
        syslog.syslog(syslog.LOG_CRIT, 'ERROR: error deleting cache entry.  Error message is %s' % (err))
        continue

Here is the entry from fp-listen.ini (see line 4):

NEWS_CACHE_DIR  = '%(BASEDIR)s/cache/new/*'

That should be:

NEWS_CACHE_DIR  = '%(BASEDIR)s/cache/news/*'

I had to make this fix on each server, other than dev. It was incorrect everywhere.

I restarted the service, which is controlled by daemontools:

sudo svc -t /var/service/fp-listen

I found the code on line 5 was not working as expected. No syslog entries were being recorded. glob.glob(filenameglob) was not enumerating everything. Tests in a small script worked. That same test script when run by the freshports user also yielded expected results. I could not locate a reason why this would fail in fp-listen.

New ClearMiscCaches() code

After reviewing the documentation for glob, I stumbled across Path and used it. I changed the ClearMiscCaches() code as shown below:

def ClearMiscCaches():
  syslog.syslog(syslog.LOG_NOTICE, 'invoked: ClearMiscCaches()');
  
  news_cache_dir = config['dirs']['NEWS_CACHE_DIR'];
  syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is clearing out entries in %s' % (news_cache_dir));

  for filename in Path(news_cache_dir).iterdir():
    syslog.syslog(syslog.LOG_NOTICE, 'ClearMiscCaches() is removing %s' % (filename))
    try:
      if Path(filename).is_file():
        Path(filename).unlink()
      else:
        shutil.rmtree(filename)

The code on line 5 is the key. We are iterating over a path to get the files we need. dotfiles will be overlooked, which is sufficient for our needs. We do not use dotfiles (e.g. .Caching_Not_Done_Like_This).

It was very strange, because glob.glob(filenameglob) worked in other parts of the fp-listen code, but not in this directory.

New package

With that code change, the new package was created and made it way through test, stage, and beta servers before being moved into production.

The cause of the problem seems to be the migration from Python 2 to Python 3. I would like to more testing on the glob issue, but I have moved onto other issues.

Why LISTEN/NOTIFY?

LISTEN/NOTIFY is a valuable tool when your database needs to invoke external events. Let the transaction[s] complete and then the external code will eventually do your bidding via the LISTEN portion of the event.

Hopefully, this will open up a whole new avenue of solutions for you.

Jan 252020
 

I’ve noticed a log of sanity test failures related to devel/binutils lately. Every single commit.

Issue 133 deals with that. Here, I plan to document what happened.

FreshPorts has, since inception, worked off the commit emails. The incoming email is parsed and XML created. The XML is then used to update the database.

I started looking in the message archives. I confirmed that the original email indicated deletes were occurring.

Yet, the XML had no deletes:

[dan@dev-ingress01:/var/db/freshports/message-queues/archive/2019_11/2019_11_15] $ cat 2019.11.15.10.09.16.85661.txt.xml 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE UPDATES PUBLIC "-//FreshPorts//DTD FreshPorts 2.0//EN" "https://www.freshports.org/docs/fp-updates.dtd">

<UPDATES Version="1.3.2.1">
    <UPDATE>
        <DATE Month="11" Day="15" Year="2019"></DATE>
        <TIME Timezone="UTC" Minute="9" Second="3" Hour="10"></TIME>
        <OS Repo="ports" Branch="head" Id="FreeBSD"></OS>
        <LOG>Update binutils to 2.33.1

While here, convert binutils into flavors, that ease a lot maintenance
and helps cleaning out the code.

This is inspired by the same work on going on the xtoolchained gcc by jhb@

PR:		241756
Exp-run:	antoine (portmgr)
Discussed with:	jhb
Differential Revision:	https://reviews.freebsd.org/D22258</LOG>
        <PEOPLE>
            <UPDATER Handle="bapt"></UPDATER>
        </PEOPLE>
        <MESSAGE Id="201911151009.xAFA9393081239@repo.freebsd.org" EncodingLosses="false" Subject="svn commit: r517642 - in head: . devel devel/aarch64-binutils devel/aarch64-none-elf-binutils devel/aarch64-none-elf-gcc devel/amd64-binutils devel/arduino-tools devel/arm-gnueabi-binutils devel/ar...">
            <DATE Year="2019" Day="15" Month="11"></DATE>
            <TIME Second="3" Minute="9" Timezone="UTC" Hour="10"></TIME>
            <REPOSITORY>ports</REPOSITORY>
            <REVISION>517642</REVISION>
            <TO Email="ports-committers@freebsd.org"></TO>
            <TO Email="svn-ports-all@freebsd.org,"></TO>
        </MESSAGE>
        <FILES>
            <FILE Action="Add" Path="head/devel/binutils/pkg-plist-aarch64" Revision="517642"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-amd64" Revision="517642" Action="Add"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-arm-gnueabi"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-arm-none-eabi"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-avr" Revision="517642" Action="Add"></FILE>
            <FILE Action="Add" Path="head/devel/binutils/pkg-plist-i386" Revision="517642"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-mingw32"></FILE>
            <FILE Revision="517642" Path="head/devel/binutils/pkg-plist-mips" Action="Add"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-mips64"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-powerpc64" Revision="517642" Action="Add"></FILE>
            <FILE Action="Add" Path="head/devel/binutils/pkg-plist-riscv64" Revision="517642"></FILE>
            <FILE Revision="517642" Path="head/devel/binutils/pkg-plist-s390x" Action="Add"></FILE>
            <FILE Revision="517642" Path="head/devel/binutils/pkg-plist-sparc64" Action="Add"></FILE>
        </FILES>
        <PR Id="241756"></PR>
    </UPDATE>
</UPDATES>
[dan@dev-ingress01:/var/db/freshports/message-queues/archive/2019_11/2019_11_15] $ 

Test the code

Testing the code, I didn’t get anything different out:

[freshports@dev-ingress01 /usr/local/libexec/freshports]$ cat /var/db/freshports/message-queues/archive/2019_11/2019_11_15/2019.11.15.10.09.16.85661.txt.raw | ./process_mail.pl 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE UPDATES PUBLIC "-//FreshPorts//DTD FreshPorts 2.0//EN" "https://www.freshports.org/docs/fp-updates.dtd">

<UPDATES Version="1.3.2.1">
    <UPDATE>
        <DATE Year="2019" Month="11" Day="15"></DATE>
        <TIME Second="3" Minute="9" Hour="10" Timezone="UTC"></TIME>
        <OS Id="FreeBSD" Branch="head" Repo="ports"></OS>
        <LOG>Update binutils to 2.33.1

While here, convert binutils into flavors, that ease a lot maintenance
and helps cleaning out the code.

This is inspired by the same work on going on the xtoolchained gcc by jhb@

PR:		241756
Exp-run:	antoine (portmgr)
Discussed with:	jhb
Differential Revision:	https://reviews.freebsd.org/D22258</LOG>
        <PEOPLE>
            <UPDATER Handle="bapt"></UPDATER>
        </PEOPLE>
        <MESSAGE EncodingLosses="false" Id="201911151009.xAFA9393081239@repo.freebsd.org" Subject="svn commit: r517642 - in head: . devel devel/aarch64-binutils devel/aarch64-none-elf-binutils devel/aarch64-none-elf-gcc devel/amd64-binutils devel/arduino-tools devel/arm-gnueabi-binutils devel/ar...">
            <DATE Month="11" Year="2019" Day="15"></DATE>
            <TIME Hour="10" Second="3" Minute="9" Timezone="UTC"></TIME>
            <REPOSITORY>ports</REPOSITORY>
            <REVISION>517642</REVISION>
            <TO Email="ports-committers@freebsd.org"></TO>
            <TO Email="svn-ports-all@freebsd.org,"></TO>
        </MESSAGE>
        <FILES>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-aarch64"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-amd64" Revision="517642" Action="Add"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-arm-gnueabi" Revision="517642" Action="Add"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-arm-none-eabi"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-avr"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-i386"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-mingw32"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-mips"></FILE>
            <FILE Action="Add" Revision="517642" Path="head/devel/binutils/pkg-plist-mips64"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-powerpc64"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-riscv64" Action="Add" Revision="517642"></FILE>
            <FILE Revision="517642" Action="Add" Path="head/devel/binutils/pkg-plist-s390x"></FILE>
            <FILE Path="head/devel/binutils/pkg-plist-sparc64" Revision="517642" Action="Add"></FILE>
        </FILES>
        <PR Id="241756"></PR>
    </UPDATE>
</UPDATES>
[freshports@dev-ingress01 /usr/local/libexec/freshports]$ 

It must be the code, or a change in email layout.

The above includes only Add actions, which directly relate to the Added: entries in the email.

Finding the issue

I found the issue. This was the fix:

-		last if($line =~ /^Directory Properties:/);
+		next if($line =~ /^Directory Properties:/);

In the existing code, when a line containing only Directory Properties: was encountered, that signaled the end of processing for this commit.

I conclude that the email layout changed.

This meant that the deletes were never being processed, given the location of this text within the original email.

With the above code, I ran the script again and dumped the XML into a file. I originally wanted to show you a diff, but by its nature XML is not always the same each time it is generated. For example, the fields within a group do not necessarily appear in the same order. As I was typing this, I thought there MUST be an online XML diff tool. I found https://www.corefiling.com/opensource/xmldiff/ and created this for your viewing pleasure:

+       <FILE Action='Add' Path='head/devel/binutils/pkg-plist-aarch64-none-elf' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/aarch64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/aarch64-none-elf-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/amd64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/arm-gnueabi-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/arm-none-eabi-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/avr-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/i386-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/mingw32-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/mips-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/mips64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/powerpc64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/riscv64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/s390x-binutils/' Revision='517642'></FILE>
+       <FILE Action='Remove' Path='head/devel/sparc64-binutils/' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/MOVED' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/aarch64-none-elf-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/arduino-tools/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/arm-none-eabi-gcc492/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/arm-none-eabi-newlib/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/avr-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/avr-libc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/binutils/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/binutils/distinfo' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/mingw32-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/powerpc64-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/riscv64-gcc/Makefile' Revision='517642'></FILE>
+       <FILE Action='Modify' Path='head/devel/sope4/Makefile' Revision='517642'></FILE>

That is a lot of missing data.

Running the new XML

After running this:

[freshports@dev-ingress01 /usr/local/libexec/freshports]$ ./load_xml_into_db.pl  /tmp/2019.11.15.10.09.16.85661-new.txt.xml -O > /tmp/2019.11.15.10.09.16.85661-new.txt.loading
Use of uninitialized value $depends in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1363.
Use of uninitialized value $depends in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1364.
Use of uninitialized value $depends in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1363.
Use of uninitialized value $depends in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1364.
Use of uninitialized value $depends in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1363.
Use of uninitialized value $depends in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1364.
Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/FreshPorts/ports_generate_plist.pm line 40.
Use of uninitialized value in string eq at /usr/local/lib/perl5/site_perl/FreshPorts/ports_generate_plist.pm line 50.
Use of uninitialized value in split at /usr/local/lib/perl5/site_perl/FreshPorts/port.pm line 1296.
[freshports@dev-ingress01 /usr/local/libexec/freshports]$ 

devel/s390x-binutils was marked as deleted. Score!

I’ll fix up those other errors next, but they are beyond the scope of this post.

Not shown here: I went back and re-processed each subsequent commit on devel/binutils

Thanks for coming to my TED talk.

Jan 092020
 

XML can be tricky. XML plays a heavy role in FreshPorts. Since the early days, converting to XML then loading into the database has been a priority. This choice means the database-loading code doesn’t need to know much about the source of the data. It always means we can change the data source without modifying the database loading code.

You might ask: why aren’t you using JSON?

FreshPorts predates JSON by several years.

Today I found an issue caused by some characters in a commit. I am not blaming the committer. There is no fault there. This is about the code and how it can be improved to not fall over.

The commit email

Looking at the email file on disk, I found this:

Author: pkubaj
Date: Wed Jan  8 21:36:57 2020
New Revision: 522460
URL: https://svnweb.freebsd.org/changeset/ports/522460

Log:
  multimedia/obs-studio: fix build on powerpc64E
  

That E after 64, that’s it.

The commit in the repo

The commit was on multimedia/obs-studio at Wed Jan 8 21:36:57 2020 UTC. Viewing that commit within the repo, nothing seems amiss

Not seeing it there makes me question: is that website front end hiding the issue?

svn log

Let’s try viewing the svn log:

[dan@pkg01:~/ports/head] $ svn log -r 522460 
------------------------------------------------------------------------
r522460 | pkubaj | 2020-01-08 21:36:57 +0000 (Wed, 08 Jan 2020) | 10 lines

multimedia/obs-studio: fix build on powerpc64

Merge upstream commit to use GCC's SSE->AltiVec translation. Since it depends on compiling with GCC, it only works on ELFv1. Hopefully it will be possible to build it on ELFv2 in the future.

Also use luajit only where it's actually available. Since it's optional, the port builds anyway.

PR:		243199
Approved by:	yuri (maintainer)
MFH:		2020Q1 (fix build blanket)

------------------------------------------------------------------------
[dan@pkg01:~/ports/head] $ 

OK we don’t see it there, but if I pipe the output through less, I see it:

[dan@pkg01:~/ports/head] $ svn log -r 522460 | less
------------------------------------------------------------------------
r522460 | pkubaj | 2020-01-08 21:36:57 +0000 (Wed, 08 Jan 2020) | 10 lines

multimedia/obs-studio: fix build on powerpc64^E

Merge upstream commit to use GCC's SSE->AltiVec translation. Since it depends on compiling with GCC, it only works on ELFv1. Hopefully it will be possible to build it on ELFv2 in the future.

Also use luajit only where it's actually available. Since it's optional, the port builds anyway.

PR:             243199
Approved by:    yuri (maintainer)
MFH:            2020Q1 (fix build blanket)

------------------------------------------------------------------------
(END)

That’s interesting. Piping the output through more gave similar results.

The error

The error I see is:


Code point \u0005 is not a valid character in XML at /usr/local/lib/perl5/site_perl/FreshPorts/process_svn_mail.pm line 183.

The XML generated contains:




    
        
        
        
        

Yes, it terminates with the LOG message, which is the text which contains the non-printing character.

What’s next?

I’m going to consult with others and see how to fix this. The code in question is perl and I’m not the best at that.

Solved with this patch

--- ingress/modules/trunk/process_svn_mail.pm	2019/01/02 21:26:27	5200
+++ ingress/modules/trunk/process_svn_mail.pm	2020/01/09 17:16:10	5244
@@ -52,6 +52,11 @@
 
 	$Log = &GetLog($message);
 
+	# re https://news.freshports.org/2020/01/09/code-point-u0005-is-not-a-valid-character-in-xml/
+	#    https://twitter.com/FreshPorts/status/1215286202691211264
+	#	
+	$Log =~ s/[^\x09\x0A\x0D\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]//go;
+
 #print "log: '$Log'\n";
 
 	if ($Log eq '') {

The most relevant hint was: “Just look at the XML spec, it says exactly which control characters are allowed.” by Garrett Wollman. Based on that, I found the suggestion for the code I used above.

Very nice. Very simple.

I implemented on the dev server. It worked. Progressing through test, stage, and onto prod, it worked.

Thank you.

Dec 212019
 

The website has been slow for some time. I think it is because of increased disk-io.

In this post:

  • FreeBSD 12.0
  • PostgreSQL 11.5
  • LibreNMS does the graphs via SNMP

The diskio overall

By increased, I mean it has gone from about 0.1k to about 1.6k, peaking to 140-150. This is steady and persistent.

This graphs shows the jump which occurred earlier this year.

FreshPorts disk io

FreshPorts disk io

My eyes can’t tell for sure which two drives are involved, but going to the graphs for individual drives, it is clearly ada3 and ada2.

ada2

This is for ada2:

ada diskio

ada diskio

ada3

And the same period for ada3:

ada3 diskio

ada3 diskio

zroot

zpool status confirms this is the zroot pool.

$ zpool status
  pool: main_tank
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 0 days 02:43:45 with 0 errors on Wed Dec 18 06:32:53 2019
config:

	NAME        STATE     READ WRITE CKSUM
	main_tank   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada0p1  ONLINE       0     0     0
	    ada1p1  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 0 days 00:04:40 with 0 errors on Thu Dec 19 03:10:19 2019
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada2p3  ONLINE       0     0     0
	    ada3p3  ONLINE       0     0     0

errors: No known data errors

The filesystems

What’s on there?

[dan@x8dtu:~] $ zfs list -r zroot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
zroot                                  26.8G  78.8G    96K  /zroot
zroot/ROOT                             5.92G  78.8G    96K  none
zroot/ROOT/default                     5.92G  78.8G  4.19G  /
zroot/TESTING                           616K  78.8G    88K  /zroot/TESTING
zroot/TESTING/top                       528K  78.8G    88K  /zroot/TESTING/top
zroot/TESTING/top/test1                  88K  78.8G    88K  /zroot/TESTING/top/test1
zroot/TESTING/top/test2                  88K  78.8G    88K  /zroot/TESTING/top/test2
zroot/TESTING/top/test3                  88K  78.8G    88K  /zroot/TESTING/top/test3
zroot/TESTING/top/test4                  88K  78.8G    88K  /zroot/TESTING/top/test4
zroot/TESTING/top/test5                  88K  78.8G    88K  /zroot/TESTING/top/test5
zroot/data                             20.8G  78.8G    88K  /zroot/data
zroot/data/dan-pg                        88K  78.8G    88K  /usr/home/dan/pg
zroot/data/freshports                  4.51G  78.8G    88K  none
zroot/data/freshports/repo             4.51G  78.8G    88K  none
zroot/data/freshports/repo/PORTS-head  4.51G  78.8G  4.51G  /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos/PORTS-head
zroot/data/ingress01-testing            675M  78.8G   675M  /iocage/jails/x8dtu-ingress01/root/usr/home/dan/tmp-fast
zroot/data/pg01-postgres               15.6G  78.8G  15.6G  /iocage/jails/x8dtu-pg01/root/var/db/postgres
zroot/data/pg02-postgres               9.25M  78.8G  9.25M  /iocage/jails/x8dtu-pg02/root/var/db/postgres
zroot/tmp                               120K  78.8G   120K  /tmp
zroot/usr                               192K  78.8G    96K  /usr
zroot/usr/src                            96K  78.8G    96K  /usr/src
zroot/var                              9.77M  78.8G    96K  /var
zroot/var/audit                          96K  78.8G    96K  /var/audit
zroot/var/crash                          96K  78.8G    96K  /var/crash
zroot/var/log                          8.48M  78.8G  8.48M  /var/log
zroot/var/mail                          120K  78.8G   120K  /var/mail
zroot/var/tmp                           912K  78.8G   912K  /var/tmp
[dan@x8dtu:~] $ 

What is likely to have the traffic?

  • zroot/data/freshports/repo/PORTS-head – that’s the man FreeBSD ports repo
  • zroot/data/pg01-postgres – the PostgreSQL database server (pg02 is another database server, but not used)
  • PostgreSQL backends

    The average number of PostgreSQL backends has jumped from about 1 to 3. This is the PostgreSQL pg01 jail.

    PostgreSQL backend pg01 jail

    PostgreSQL backend pg01 jail

    What changed?

    Two things come to mind:

    1. caching – I added caching on disk for some of the web pages. If anything, this should reduce the disk io.
    2. recordsize – I changed some parameters on both zroot/data/pg01-postgres and zroot/data/pg02-postgres

    Looking back at command history, I see:

    [dan@x8dtu:~] $ history | grep recordsize
        7  zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres postgres                                  11.6M  63.2G  9.18M  /iocage/jails/x8dtu-pg02/root/var/db/postgres
        8  zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres zroot/data/pg02-postgres
        9  sudo zfs set recordsize=8k              redundant_metadata=most              primarycache=metadata              logbias=throughput zroot/data/pg01-postgres zroot/data/pg02-postgres
       10  zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres zroot/data/pg02-postgres
      496  history | grep recordsize
    

    These are the current settings for pg01, the current production database server:

    [dan@x8dtu:~] $ zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres 
    NAME                      PROPERTY            VALUE         SOURCE
    zroot/data/pg01-postgres  recordsize          8K            local
    zroot/data/pg01-postgres  redundant_metadata  most          local
    zroot/data/pg01-postgres  primarycache        metadata      local
    zroot/data/pg01-postgres  logbias             throughput    local
    [dan@x8dtu:~] $ 
    

    Looking at the parent, these are the likely settings before the change:

    [dan@x8dtu:~] $ zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data
    NAME        PROPERTY            VALUE         SOURCE
    zroot/data  recordsize          128K          default
    zroot/data  redundant_metadata  all           default
    zroot/data  primarycache        all           default
    zroot/data  logbias             latency       default
    [dan@x8dtu:~] $ 
    

    What changed?

    • recordsize – from 128K to 8K
    • redundant_metadata – from all to most
    • primarycache – from all to metadata
    • logbias – from latency to throughput

    top

    An extract from top shows:

     pid: 43619;  load averages:  1.01,  1.39,  1.48           up 34+18:05:39  15:19:57
    195 processes: 9 running, 186 sleeping
    CPU:  1.9% user,  0.0% nice,  6.5% system,  0.0% interrupt, 91.6% idle
    Mem: 338M Active, 720M Inact, 102G Wired, 85G Free
    ARC: 62G Total, 48G MFU, 8620M MRU, 3603K Anon, 759M Header, 5407M Other
         47G Compressed, 117G Uncompressed, 2.47:1 Ratio
    Swap: 4096M Total, 4096M Free

    Lots of free RAM (85G).

    Plenty of ARC in use (62G).

    Theories

    Could the change in recordsize cause this issue? Some records are now 128K, some are 8K?

    If yes, then shutting down the database, and zfs send | zfs recv the entire dataset to a new dataset would make everything 8K.

    The jump in number of backend workers, could that be because queries now take longer and requests from the websites are now over lapping?

    If the cause is one or a combination of redundant_metadata, primarycache, and logbias, changing them back and observing might show a difference.

    Setting changes

    The following commentary is based on a IRC chat with Allan Jude.

    primarycache=metadata means that ZFS will cache only the ZFS metadata, not any PostgreSQL database data. The idea behind this configuration is to have no ZFS data in the ARC, thereby avoiding both PostgreSQL and ZFS caching the same data twice. However, we might be better off letting ZFS cache this data, since the dataset is only 15.6GB and it will all fit into ARC compressed.

    logbias=throughput tells ZFS to optimize for throughput. This will not do synchronous writes to the ZIL. This will make commits to the database take longer. Setting this back to latency is probably better for a database.

    At about Fri Dec 20 15:38 UTC 2019, I ran: zfs set primarycache=all zroot/data/pg01-postgres

    To speed up the caching, I ran this command:

    [dan@x8dtu:~] $ sudo zfs snapshot zroot/data/pg01-postgres@for.caching
    [dan@x8dtu:~] $ sudo zfs send zroot/data/pg01-postgres@for.caching > /dev/null
    [dan@x8dtu:~] $ sudo zfs destroy zroot/data/pg01-postgres@for.caching
    [dan@x8dtu:~] $ 
    

    That will read all the dataset into ARC.

    PostgreSQL database settings

    This is what PostgreSQL is working with:

    shared_buffers = 128MB

    That is spread rather thinly for a 15.6GB database.

    A a wise person recently said: the key to database performance with ZFS is you need exactly one of [database|filesystem] caching the database. 0 of them caching or both of them caching is bad. In years past I’ve had better luck letting the database cache than ZFS caching. But that was before compressed ARC.

    20 minutes later

    Now the graph is trending down. This is the past 6 hours.

    pg01 disk io going down

    pg01 disk io going down

    Nearly two hours later

    Yes, I declare this fixed. The overall diskio has dropped back to what it was before the change.

    The number of PostgreSQL backends has also dropped.

    PostgresQL Backend back to normal

    PostgresQL Backend back to normal

    28 hours later

    I waited until the next day to include this graph. It clearly shows how the IO has dropped back to the same level as before. This graph covers a two-month period.

    two month graph

    two month graph

    Thanks

    Sorry it took so long to get to. Glad I had Allan and Josh to help out.

Sep 272019
 

I was just creating a new jail for working on git & FreshPorts. I was intrigued to see that iocage uses send | receive to create the new jail:

[dan@slocum:~] $ ps auwwx | grep iocage
root      64166    3.7  0.0   12788    4036  1  D+   21:16         0:06.10 zfs send system/iocage/releases/12.0-RELEASE/root@git-dev
root      64167    2.8  0.0   12752    4036  1  S+   21:16         0:03.60 zfs receive -F system/iocage/jails/git-dev/root
root      63910    0.0  0.0   16480    7384  1  I+   21:16         0:00.01 sudo iocage create -r 12.0-RELEASE --thickjail --name git-dev
root      63911    0.0  0.0   53344   42484  1  I+   21:16         0:01.01 /usr/local/bin/python3.6 /usr/local/bin/iocage create -r 12.0-RELEASE --thickjail --name git-dev
dan       67954    0.0  0.0   11288    2732  3  S+   21:18         0:00.00 grep iocage
[dan@slocum:~] $ 

More later, after I get this jail configured.

Edit: 2019-09-28

From Twitter:

Something is being copied, is that a cached version of the jail template?

The answer is a local copy of FreeBSD 12.0-RELEASE:

[dan@slocum:~] $ zfs list -r system/iocage/releases
NAME                                       USED  AVAIL  REFER  MOUNTPOINT
system/iocage/releases                    3.15G  15.9T   176K  /iocage/releases
system/iocage/releases/11.2-RELEASE       1.44G  15.9T   176K  /iocage/releases/11.2-RELEASE
system/iocage/releases/11.2-RELEASE/root  1.44G  15.9T  1.44G  /iocage/releases/11.2-RELEASE/root
system/iocage/releases/12.0-RELEASE       1.71G  15.9T   176K  /iocage/releases/12.0-RELEASE
system/iocage/releases/12.0-RELEASE/root  1.71G  15.9T  1.71G  /iocage/releases/12.0-RELEASE/root
[dan@slocum:~] $ 

What’s in there?

[dan@slocum:~] $ ls /iocage/releases/12.0-RELEASE/root
COPYRIGHT boot      etc       libexec   mnt       proc      root      sys       usr
bin       dev       lib       media     net       rescue    sbin      tmp       var
[dan@slocum:~] $ 
Sep 222019
 

We have the first commit process via git into FreshPorts. Details in this git comment.

Work remaining:

  1. check out that commit into the working copy of the files
  2. run make -V on the working copy to get the refreshed values for the port[s] affected by this commit

The 2nd part – very little code change.

The 1st part is just playing with git.

My thanks to Sergey Kozlov for his code which creates the XML FreshPorts needs for commit processing. That has been a great timesavings to me.

Sep 182019
 

I want to move FreshPorts towards using commit hooks and away from depending upon incoming emails for processing new commits.

Much of the following came from a recent Twitter post.

You might think: why are we using emails? Why? Because we can. They were the easiest and most simple approach. It is a time-proven solution. Look at https://docs.freshports.org/ and you can see the original ideas from 2001. That is over 18 years of providing data.

If email is so good, why stop?

Because we can.

And we won’t stop using email.

Email will stay around as a fall-back position. Commit hooks are tighter dependency upon a third party and requires close cooperation. Should that relationship sour, the cooperation may terminate.

If web-hooks proceed, email processing will be modified to introduce an N-minute delay. After leaving the N-minute queue, the mail will be:

  • ignored if the commit has already been processed
  • processed if the commit is not in the database

How is a commit identified

Email processing is based upon the Message-Id contained within the database. Duplicates are ignored.

I am not sure if we also check the subversion revision number. That might be wise. There is an index, but it is not unique.

If we move to commit-hooks, message-id will not be available. We will have to change to relying upon the revision number or, in the case of git, the commit hash.

ACTIONS:

  • add unique ID to commit_log.svn_revision
  • remove not null constraint on commit_log.message_id
  • add commit_log.commit_hash with a unique index

Commit processing

Regardless of how we get notified of a new commit, we must be able to put our local copy of the repo into the state as of a given commit.

For subversion, we do this:

svn up -r REVISION

After this, various commands, such as make -V, are run to extract the necessary values from the ports tree (as of the commit). This information includes PORTVERSION, PORTREVISION, etc. You can see why is it vital to have everything in our ports tree reflect the repo as of that particular commit.

For git, it is it similar:

git checkout HASH

The same scripts, as describe above, would be run.

Commit hooks

These are the assumptions for a commit hook:

  1. the hook gets triggered exactly once per commit
  2. the hook is fast, so as not to slow down commits

In order to be fast, the basic information has to be passed along to another daemon, which then puts it into a queue, which is then processed by another daemon. This queue must be persistent.

I am using hare and hared here as examples only because I am familiar with them. They won’t actually what I need, but if I was to fork them and modify them for this specific task, I think they would do the job rather well.

My initial thoughts are:

  1. The hook invokes something like hare (see also sysutils/hare) which sends a udp packet to something else. The packet contains the commit revision number (if subversion) or hash (if git).
  2. The udp is received by something like hared (same link as above for hare, but available via sysutils/py-hared).
  3. hared then adds the data to a queue. What type of queue and where it is located is for later design.

Commit iteration

When processing email, the looping through the email is your iteration. When you have no email, you need something to iterate through.

git commit iteration

I think this is the command we want to use when iterating through git commits:

git rev-list ..HEAD

Where is the hash of our most recently processed commit. Most recently is not necessarily the last one we committed. It is the commit with the most recent timestamp. Here is an example:

$ git rev-list ee38cccad8f76b807206165324e7bf771aa981dc..HEAD
0ca46774ac207517724eb48338c04a4dbde0728a
a3361806ab49043fca46f81a0edc2357b7d3947c

Using the above, perhaps the logic for processing commits will be:

detect a new commit
git pull
use git rev-list to get list of new commits
for i = oldest new commit to newest new commit {
  git checkout a commit
  magic
}

subversion commit iteration

With subversion we have a revision id, which is an integer.

The repo can be queried for their highest commit via:

svn log -r head

With that revision number, the code to process the commits is

for i = LastCommitProcess + 1; i <= LatestCommit; i++ {
  svn up -r $i
  process that commit
}

How do we handle gaps in the subversion revision sequence? If we have commits, 5, 7, and 8, where is commit 6? How do we note that commit 6 does not exist and that we need to get it? What if the repo has no commit 6?

Current status

Extracting the required data from the repo instead of the email should be straight forward. It must still be tested and verified.

Iterating the commits is still something which needs to be proven that it works. Hopefully that can start soon.

Sep 032019
 

I’m trying to think of a list of things that FreshPorts can do which might be useful.

I can think of these:

  • provides example dependency line. e.g. p5-XML-RSS>0:textproc/p5-XML-RSS
  • list of dependencies for a port
  • list of ports depending upon this port
  • Default configuration options
  • what packages install a given file (e.g. bin/unzip)
  • what ports does this person maintain?
  • which Makefiles contain a reference to bunzip?
  • search results can be plain-text consisting of a list of foo/bar ports
  • The Maximum Effort checkbox on the search page does nothing.
  • Committers can be notified of sanity test failures after the commit
  • Find a commit, any commit, based on SVN revision number, e.g. : https://www.freshports.org/commit.php?revision=352332

Any more ideas?

Sep 022019
 

When the time comes, and the FreeBSD project is using git, there will be work to be done on FreshPorts. If the commit emails are similar to those under cvs and svn, it should be straight forward to parse the email and convert it to XML.

Once the data is in XML, the commit can be loaded into FreshPorts. The commit is the basis for most other data.

I am not sure of the work process after that. I think it will be as simple as:

  1. git pull
  2. git checkout HASH

where HASH is the hash value associated with the commit in question. I’m assuming the commit hash will be in the commit email.

Processing commits in order

One longstanding FreshPorts issue (which I notice is not recorded): If commits are processed out of order, things can go wrong.

FreshPorts depends upon the email arriving in order in which the commits occurred. There is no guarantee of this. FreshPorts processes the emails in the order they arrive. It achieves this by putting each email into a queue and processing the queue in order.

This is the ideal workflow:

  1. FreshPorts gets a notice: Hey, there’s been a commit
  2. FreshPorts looks to see how many new commits there and processes each one in order

Step 1 can be as easy as querying the repo manually every minute, or a hook on the repo which taps FreshPorts.

Step 2 Might be challenging, but I am not sure. I don’t know how to say: list me all commits after X. I don’t know how to detect missed commits.

List git commits

Here is a partial list of git commits:

[dan@dev-nginx01:~/www] $ git log
commit 6e21a5fd3a7eeea3ada9896b1b5657a6ba121fd8 (HEAD -> master, origin/master, origin/HEAD)
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:24:51 2019 +0000

    Simplify the deleted ports section of "This port is required by"
    
    Remove the <dl><dd><dt> stuff and keep it straight forward.
    
    Move the "Collapse this list" into the final <li> of the list.

commit 11950339914066ea9298db4fbccc421a1d414108
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:12:29 2019 +0000

    Fix display of long lists
    
    Fixes #126
    
    While here, fix the #hidden part of the "Expand this list (N items / X hidden)" message.

commit <strong>5f0c06c21cb8be3136d7562e12033d39d963d8b3</strong>
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 12:59:35 2019 +0000

    Improve links to validation URLS
    
    * move to https
    * show source on HTML link
    * add referer to CSS link

commit 20c2f1d6619e968db56f42b6632d4ddf6a8d00bb (tag: 1.35)
Author: Dan Langille <dan@langille.org>
Date:   Tue Aug 20 16:19:47 2019 +0000

    Under 'This port is required by:' format deleted ports better
    
    Fixes #125

commit cc188d6ecde7a19c7317ca5477495e1618d70fe9
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 16 19:04:09 2019 +0000

    Add more constants:
    
    * FRESHPORTS_LOG_CACHE_ACTIVITY - log all caching activity
    * PKG_MESSAGE_UCL               - process pkg-message as UCL content

commit 309b10946785ce4254e71b9ebbf116c98095fa53 (tag: 1.34.2)
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 16 18:32:59 2019 +0000

    Comment out some debug stuff.
    Remove fseek, not required.
...
...

The issue: If the last commit processed by FreshPorts is 5f0c06c21cb8be3136d7562e12033d39d963d8b3, how can I get of list of all commit since then?

Google tells me:

[dan@dev-nginx01:~/www] $ git log <strong>5f0c06c21cb8be3136d7562e12033d39d963d8b3</strong>..
commit 6e21a5fd3a7eeea3ada9896b1b5657a6ba121fd8 (HEAD -> master, origin/master, origin/HEAD)
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:24:51 2019 +0000

    Simplify the deleted ports section of "This port is required by"
    
    Remove the <dl><dd><dt> stuff and keep it straight forward.
    
    Move the "Collapse this list" into the final <li> of the list.

commit 11950339914066ea9298db4fbccc421a1d414108
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:12:29 2019 +0000

    Fix display of long lists
    
    Fixes #126
    
    While here, fix the #hidden part of the "Expand this list (N items / X hidden)" message.
[dan@dev-nginx01:~/www] $ 

Work is required

Regardless of when git arrives, there will be work to be done. How much work, I don’t know yet.

Jul 132019
 

Today I updated the test website with two changes:

  1. use of dd, dt, and dl tags in the details section of the ports page
  2. Three new graphs:
    1. doc
    2. ports
    3. src

The tags part was all the result of me reading up on them and concluding they could be useful.

The graphs were swills’ fault. They took about an hour to do, an most of that was figuring out the required changes.

I started with www/graphs2.php

Also involved is www/generate_content.php and www/graphs.js, but you can see the whole commit if you want.

Not included in the code are some SQL queries, which were saved in the issue.

Enjoy.