Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Jan 092020
 

XML can be tricky. XML plays a heavy role in FreshPorts. Since the early days, converting to XML then loading into the database has been a priority. This choice means the database-loading code doesn’t need to know much about the source of the data. It always means we can change the data source without modifying the database loading code.

You might ask: why aren’t you using JSON?

FreshPorts predates JSON by several years.

Today I found an issue caused by some characters in a commit. I am not blaming the committer. There is no fault there. This is about the code and how it can be improved to not fall over.

The commit email

Looking at the email file on disk, I found this:

Author: pkubaj
Date: Wed Jan  8 21:36:57 2020
New Revision: 522460
URL: https://svnweb.freebsd.org/changeset/ports/522460

Log:
  multimedia/obs-studio: fix build on powerpc64E
  

That E after 64, that’s it.

The commit in the repo

The commit was on multimedia/obs-studio at Wed Jan 8 21:36:57 2020 UTC. Viewing that commit within the repo, nothing seems amiss

Not seeing it there makes me question: is that website front end hiding the issue?

svn log

Let’s try viewing the svn log:

[dan@pkg01:~/ports/head] $ svn log -r 522460 
------------------------------------------------------------------------
r522460 | pkubaj | 2020-01-08 21:36:57 +0000 (Wed, 08 Jan 2020) | 10 lines

multimedia/obs-studio: fix build on powerpc64

Merge upstream commit to use GCC's SSE->AltiVec translation. Since it depends on compiling with GCC, it only works on ELFv1. Hopefully it will be possible to build it on ELFv2 in the future.

Also use luajit only where it's actually available. Since it's optional, the port builds anyway.

PR:		243199
Approved by:	yuri (maintainer)
MFH:		2020Q1 (fix build blanket)

------------------------------------------------------------------------
[dan@pkg01:~/ports/head] $ 

OK we don’t see it there, but if I pipe the output through less, I see it:

[dan@pkg01:~/ports/head] $ svn log -r 522460 | less
------------------------------------------------------------------------
r522460 | pkubaj | 2020-01-08 21:36:57 +0000 (Wed, 08 Jan 2020) | 10 lines

multimedia/obs-studio: fix build on powerpc64^E

Merge upstream commit to use GCC's SSE->AltiVec translation. Since it depends on compiling with GCC, it only works on ELFv1. Hopefully it will be possible to build it on ELFv2 in the future.

Also use luajit only where it's actually available. Since it's optional, the port builds anyway.

PR:             243199
Approved by:    yuri (maintainer)
MFH:            2020Q1 (fix build blanket)

------------------------------------------------------------------------
(END)

That’s interesting. Piping the output through more gave similar results.

The error

The error I see is:


Code point \u0005 is not a valid character in XML at /usr/local/lib/perl5/site_perl/FreshPorts/process_svn_mail.pm line 183.

The XML generated contains:




    
        
        
        
        

Yes, it terminates with the LOG message, which is the text which contains the non-printing character.

What’s next?

I’m going to consult with others and see how to fix this. The code in question is perl and I’m not the best at that.

Solved with this patch

--- ingress/modules/trunk/process_svn_mail.pm	2019/01/02 21:26:27	5200
+++ ingress/modules/trunk/process_svn_mail.pm	2020/01/09 17:16:10	5244
@@ -52,6 +52,11 @@
 
 	$Log = &GetLog($message);
 
+	# re https://news.freshports.org/2020/01/09/code-point-u0005-is-not-a-valid-character-in-xml/
+	#    https://twitter.com/FreshPorts/status/1215286202691211264
+	#	
+	$Log =~ s/[^\x09\x0A\x0D\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]//go;
+
 #print "log: '$Log'\n";
 
 	if ($Log eq '') {

The most relevant hint was: “Just look at the XML spec, it says exactly which control characters are allowed.” by Garrett Wollman. Based on that, I found the suggestion for the code I used above.

Very nice. Very simple.

I implemented on the dev server. It worked. Progressing through test, stage, and onto prod, it worked.

Thank you.

Dec 212019
 

The website has been slow for some time. I think it is because of increased disk-io.

In this post:

  • FreeBSD 12.0
  • PostgreSQL 11.5
  • LibreNMS does the graphs via SNMP

The diskio overall

By increased, I mean it has gone from about 0.1k to about 1.6k, peaking to 140-150. This is steady and persistent.

This graphs shows the jump which occurred earlier this year.

FreshPorts disk io

FreshPorts disk io

My eyes can’t tell for sure which two drives are involved, but going to the graphs for individual drives, it is clearly ada3 and ada2.

ada2

This is for ada2:

ada diskio

ada diskio

ada3

And the same period for ada3:

ada3 diskio

ada3 diskio

zroot

zpool status confirms this is the zroot pool.

$ zpool status
  pool: main_tank
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 0 days 02:43:45 with 0 errors on Wed Dec 18 06:32:53 2019
config:

	NAME        STATE     READ WRITE CKSUM
	main_tank   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada0p1  ONLINE       0     0     0
	    ada1p1  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 0 days 00:04:40 with 0 errors on Thu Dec 19 03:10:19 2019
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada2p3  ONLINE       0     0     0
	    ada3p3  ONLINE       0     0     0

errors: No known data errors

The filesystems

What’s on there?

[dan@x8dtu:~] $ zfs list -r zroot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
zroot                                  26.8G  78.8G    96K  /zroot
zroot/ROOT                             5.92G  78.8G    96K  none
zroot/ROOT/default                     5.92G  78.8G  4.19G  /
zroot/TESTING                           616K  78.8G    88K  /zroot/TESTING
zroot/TESTING/top                       528K  78.8G    88K  /zroot/TESTING/top
zroot/TESTING/top/test1                  88K  78.8G    88K  /zroot/TESTING/top/test1
zroot/TESTING/top/test2                  88K  78.8G    88K  /zroot/TESTING/top/test2
zroot/TESTING/top/test3                  88K  78.8G    88K  /zroot/TESTING/top/test3
zroot/TESTING/top/test4                  88K  78.8G    88K  /zroot/TESTING/top/test4
zroot/TESTING/top/test5                  88K  78.8G    88K  /zroot/TESTING/top/test5
zroot/data                             20.8G  78.8G    88K  /zroot/data
zroot/data/dan-pg                        88K  78.8G    88K  /usr/home/dan/pg
zroot/data/freshports                  4.51G  78.8G    88K  none
zroot/data/freshports/repo             4.51G  78.8G    88K  none
zroot/data/freshports/repo/PORTS-head  4.51G  78.8G  4.51G  /iocage/jails/x8dtu-ingress01/root/var/db/freshports/ports-jail/var/db/repos/PORTS-head
zroot/data/ingress01-testing            675M  78.8G   675M  /iocage/jails/x8dtu-ingress01/root/usr/home/dan/tmp-fast
zroot/data/pg01-postgres               15.6G  78.8G  15.6G  /iocage/jails/x8dtu-pg01/root/var/db/postgres
zroot/data/pg02-postgres               9.25M  78.8G  9.25M  /iocage/jails/x8dtu-pg02/root/var/db/postgres
zroot/tmp                               120K  78.8G   120K  /tmp
zroot/usr                               192K  78.8G    96K  /usr
zroot/usr/src                            96K  78.8G    96K  /usr/src
zroot/var                              9.77M  78.8G    96K  /var
zroot/var/audit                          96K  78.8G    96K  /var/audit
zroot/var/crash                          96K  78.8G    96K  /var/crash
zroot/var/log                          8.48M  78.8G  8.48M  /var/log
zroot/var/mail                          120K  78.8G   120K  /var/mail
zroot/var/tmp                           912K  78.8G   912K  /var/tmp
[dan@x8dtu:~] $ 

What is likely to have the traffic?

  • zroot/data/freshports/repo/PORTS-head – that’s the man FreeBSD ports repo
  • zroot/data/pg01-postgres – the PostgreSQL database server (pg02 is another database server, but not used)
  • PostgreSQL backends

    The average number of PostgreSQL backends has jumped from about 1 to 3. This is the PostgreSQL pg01 jail.

    PostgreSQL backend pg01 jail

    PostgreSQL backend pg01 jail

    What changed?

    Two things come to mind:

    1. caching – I added caching on disk for some of the web pages. If anything, this should reduce the disk io.
    2. recordsize – I changed some parameters on both zroot/data/pg01-postgres and zroot/data/pg02-postgres

    Looking back at command history, I see:

    [dan@x8dtu:~] $ history | grep recordsize
        7  zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres postgres                                  11.6M  63.2G  9.18M  /iocage/jails/x8dtu-pg02/root/var/db/postgres
        8  zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres zroot/data/pg02-postgres
        9  sudo zfs set recordsize=8k              redundant_metadata=most              primarycache=metadata              logbias=throughput zroot/data/pg01-postgres zroot/data/pg02-postgres
       10  zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres zroot/data/pg02-postgres
      496  history | grep recordsize
    

    These are the current settings for pg01, the current production database server:

    [dan@x8dtu:~] $ zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data/pg01-postgres 
    NAME                      PROPERTY            VALUE         SOURCE
    zroot/data/pg01-postgres  recordsize          8K            local
    zroot/data/pg01-postgres  redundant_metadata  most          local
    zroot/data/pg01-postgres  primarycache        metadata      local
    zroot/data/pg01-postgres  logbias             throughput    local
    [dan@x8dtu:~] $ 
    

    Looking at the parent, these are the likely settings before the change:

    [dan@x8dtu:~] $ zfs get recordsize,redundant_metadata,primarycache,logbias zroot/data
    NAME        PROPERTY            VALUE         SOURCE
    zroot/data  recordsize          128K          default
    zroot/data  redundant_metadata  all           default
    zroot/data  primarycache        all           default
    zroot/data  logbias             latency       default
    [dan@x8dtu:~] $ 
    

    What changed?

    • recordsize – from 128K to 8K
    • redundant_metadata – from all to most
    • primarycache – from all to metadata
    • logbias – from latency to throughput

    top

    An extract from top shows:

     pid: 43619;  load averages:  1.01,  1.39,  1.48           up 34+18:05:39  15:19:57
    195 processes: 9 running, 186 sleeping
    CPU:  1.9% user,  0.0% nice,  6.5% system,  0.0% interrupt, 91.6% idle
    Mem: 338M Active, 720M Inact, 102G Wired, 85G Free
    ARC: 62G Total, 48G MFU, 8620M MRU, 3603K Anon, 759M Header, 5407M Other
         47G Compressed, 117G Uncompressed, 2.47:1 Ratio
    Swap: 4096M Total, 4096M Free

    Lots of free RAM (85G).

    Plenty of ARC in use (62G).

    Theories

    Could the change in recordsize cause this issue? Some records are now 128K, some are 8K?

    If yes, then shutting down the database, and zfs send | zfs recv the entire dataset to a new dataset would make everything 8K.

    The jump in number of backend workers, could that be because queries now take longer and requests from the websites are now over lapping?

    If the cause is one or a combination of redundant_metadata, primarycache, and logbias, changing them back and observing might show a difference.

    Setting changes

    The following commentary is based on a IRC chat with Allan Jude.

    primarycache=metadata means that ZFS will cache only the ZFS metadata, not any PostgreSQL database data. The idea behind this configuration is to have no ZFS data in the ARC, thereby avoiding both PostgreSQL and ZFS caching the same data twice. However, we might be better off letting ZFS cache this data, since the dataset is only 15.6GB and it will all fit into ARC compressed.

    logbias=throughput tells ZFS to optimize for throughput. This will not do synchronous writes to the ZIL. This will make commits to the database take longer. Setting this back to latency is probably better for a database.

    At about Fri Dec 20 15:38 UTC 2019, I ran: zfs set primarycache=all zroot/data/pg01-postgres

    To speed up the caching, I ran this command:

    [dan@x8dtu:~] $ sudo zfs snapshot zroot/data/pg01-postgres@for.caching
    [dan@x8dtu:~] $ sudo zfs send zroot/data/pg01-postgres@for.caching > /dev/null
    [dan@x8dtu:~] $ sudo zfs destroy zroot/data/pg01-postgres@for.caching
    [dan@x8dtu:~] $ 
    

    That will read all the dataset into ARC.

    PostgreSQL database settings

    This is what PostgreSQL is working with:

    shared_buffers = 128MB

    That is spread rather thinly for a 15.6GB database.

    A a wise person recently said: the key to database performance with ZFS is you need exactly one of [database|filesystem] caching the database. 0 of them caching or both of them caching is bad. In years past I’ve had better luck letting the database cache than ZFS caching. But that was before compressed ARC.

    20 minutes later

    Now the graph is trending down. This is the past 6 hours.

    pg01 disk io going down

    pg01 disk io going down

    Nearly two hours later

    Yes, I declare this fixed. The overall diskio has dropped back to what it was before the change.

    The number of PostgreSQL backends has also dropped.

    PostgresQL Backend back to normal

    PostgresQL Backend back to normal

    28 hours later

    I waited until the next day to include this graph. It clearly shows how the IO has dropped back to the same level as before. This graph covers a two-month period.

    two month graph

    two month graph

    Thanks

    Sorry it took so long to get to. Glad I had Allan and Josh to help out.

Sep 272019
 

I was just creating a new jail for working on git & FreshPorts. I was intrigued to see that iocage uses send | receive to create the new jail:

[dan@slocum:~] $ ps auwwx | grep iocage
root      64166    3.7  0.0   12788    4036  1  D+   21:16         0:06.10 zfs send system/iocage/releases/12.0-RELEASE/root@git-dev
root      64167    2.8  0.0   12752    4036  1  S+   21:16         0:03.60 zfs receive -F system/iocage/jails/git-dev/root
root      63910    0.0  0.0   16480    7384  1  I+   21:16         0:00.01 sudo iocage create -r 12.0-RELEASE --thickjail --name git-dev
root      63911    0.0  0.0   53344   42484  1  I+   21:16         0:01.01 /usr/local/bin/python3.6 /usr/local/bin/iocage create -r 12.0-RELEASE --thickjail --name git-dev
dan       67954    0.0  0.0   11288    2732  3  S+   21:18         0:00.00 grep iocage
[dan@slocum:~] $ 

More later, after I get this jail configured.

Edit: 2019-09-28

From Twitter:

Something is being copied, is that a cached version of the jail template?

The answer is a local copy of FreeBSD 12.0-RELEASE:

[dan@slocum:~] $ zfs list -r system/iocage/releases
NAME                                       USED  AVAIL  REFER  MOUNTPOINT
system/iocage/releases                    3.15G  15.9T   176K  /iocage/releases
system/iocage/releases/11.2-RELEASE       1.44G  15.9T   176K  /iocage/releases/11.2-RELEASE
system/iocage/releases/11.2-RELEASE/root  1.44G  15.9T  1.44G  /iocage/releases/11.2-RELEASE/root
system/iocage/releases/12.0-RELEASE       1.71G  15.9T   176K  /iocage/releases/12.0-RELEASE
system/iocage/releases/12.0-RELEASE/root  1.71G  15.9T  1.71G  /iocage/releases/12.0-RELEASE/root
[dan@slocum:~] $ 

What’s in there?

[dan@slocum:~] $ ls /iocage/releases/12.0-RELEASE/root
COPYRIGHT boot      etc       libexec   mnt       proc      root      sys       usr
bin       dev       lib       media     net       rescue    sbin      tmp       var
[dan@slocum:~] $ 
Sep 222019
 

We have the first commit process via git into FreshPorts. Details in this git comment.

Work remaining:

  1. check out that commit into the working copy of the files
  2. run make -V on the working copy to get the refreshed values for the port[s] affected by this commit

The 2nd part – very little code change.

The 1st part is just playing with git.

My thanks to Sergey Kozlov for his code which creates the XML FreshPorts needs for commit processing. That has been a great timesavings to me.

Sep 182019
 

I want to move FreshPorts towards using commit hooks and away from depending upon incoming emails for processing new commits.

Much of the following came from a recent Twitter post.

You might think: why are we using emails? Why? Because we can. They were the easiest and most simple approach. It is a time-proven solution. Look at https://docs.freshports.org/ and you can see the original ideas from 2001. That is over 18 years of providing data.

If email is so good, why stop?

Because we can.

And we won’t stop using email.

Email will stay around as a fall-back position. Commit hooks are tighter dependency upon a third party and requires close cooperation. Should that relationship sour, the cooperation may terminate.

If web-hooks proceed, email processing will be modified to introduce an N-minute delay. After leaving the N-minute queue, the mail will be:

  • ignored if the commit has already been processed
  • processed if the commit is not in the database

How is a commit identified

Email processing is based upon the Message-Id contained within the database. Duplicates are ignored.

I am not sure if we also check the subversion revision number. That might be wise. There is an index, but it is not unique.

If we move to commit-hooks, message-id will not be available. We will have to change to relying upon the revision number or, in the case of git, the commit hash.

ACTIONS:

  • add unique ID to commit_log.svn_revision
  • remove not null constraint on commit_log.message_id
  • add commit_log.commit_hash with a unique index

Commit processing

Regardless of how we get notified of a new commit, we must be able to put our local copy of the repo into the state as of a given commit.

For subversion, we do this:

svn up -r REVISION

After this, various commands, such as make -V, are run to extract the necessary values from the ports tree (as of the commit). This information includes PORTVERSION, PORTREVISION, etc. You can see why is it vital to have everything in our ports tree reflect the repo as of that particular commit.

For git, it is it similar:

git checkout HASH

The same scripts, as describe above, would be run.

Commit hooks

These are the assumptions for a commit hook:

  1. the hook gets triggered exactly once per commit
  2. the hook is fast, so as not to slow down commits

In order to be fast, the basic information has to be passed along to another daemon, which then puts it into a queue, which is then processed by another daemon. This queue must be persistent.

I am using hare and hared here as examples only because I am familiar with them. They won’t actually what I need, but if I was to fork them and modify them for this specific task, I think they would do the job rather well.

My initial thoughts are:

  1. The hook invokes something like hare (see also sysutils/hare) which sends a udp packet to something else. The packet contains the commit revision number (if subversion) or hash (if git).
  2. The udp is received by something like hared (same link as above for hare, but available via sysutils/py-hared).
  3. hared then adds the data to a queue. What type of queue and where it is located is for later design.

Commit iteration

When processing email, the looping through the email is your iteration. When you have no email, you need something to iterate through.

git commit iteration

I think this is the command we want to use when iterating through git commits:

git rev-list ..HEAD

Where is the hash of our most recently processed commit. Most recently is not necessarily the last one we committed. It is the commit with the most recent timestamp. Here is an example:

$ git rev-list ee38cccad8f76b807206165324e7bf771aa981dc..HEAD
0ca46774ac207517724eb48338c04a4dbde0728a
a3361806ab49043fca46f81a0edc2357b7d3947c

Using the above, perhaps the logic for processing commits will be:

detect a new commit
git pull
use git rev-list to get list of new commits
for i = oldest new commit to newest new commit {
  git checkout a commit
  magic
}

subversion commit iteration

With subversion we have a revision id, which is an integer.

The repo can be queried for their highest commit via:

svn log -r head

With that revision number, the code to process the commits is

for i = LastCommitProcess + 1; i <= LatestCommit; i++ {
  svn up -r $i
  process that commit
}

How do we handle gaps in the subversion revision sequence? If we have commits, 5, 7, and 8, where is commit 6? How do we note that commit 6 does not exist and that we need to get it? What if the repo has no commit 6?

Current status

Extracting the required data from the repo instead of the email should be straight forward. It must still be tested and verified.

Iterating the commits is still something which needs to be proven that it works. Hopefully that can start soon.

Sep 032019
 

I’m trying to think of a list of things that FreshPorts can do which might be useful.

I can think of these:

  • provides example dependency line. e.g. p5-XML-RSS>0:textproc/p5-XML-RSS
  • list of dependencies for a port
  • list of ports depending upon this port
  • Default configuration options
  • what packages install a given file (e.g. bin/unzip)
  • what ports does this person maintain?
  • which Makefiles contain a reference to bunzip?
  • search results can be plain-text consisting of a list of foo/bar ports
  • The Maximum Effort checkbox on the search page does nothing.
  • Committers can be notified of sanity test failures after the commit
  • Find a commit, any commit, based on SVN revision number, e.g. : https://www.freshports.org/commit.php?revision=352332

Any more ideas?

Sep 022019
 

When the time comes, and the FreeBSD project is using git, there will be work to be done on FreshPorts. If the commit emails are similar to those under cvs and svn, it should be straight forward to parse the email and convert it to XML.

Once the data is in XML, the commit can be loaded into FreshPorts. The commit is the basis for most other data.

I am not sure of the work process after that. I think it will be as simple as:

  1. git pull
  2. git checkout HASH

where HASH is the hash value associated with the commit in question. I’m assuming the commit hash will be in the commit email.

Processing commits in order

One longstanding FreshPorts issue (which I notice is not recorded): If commits are processed out of order, things can go wrong.

FreshPorts depends upon the email arriving in order in which the commits occurred. There is no guarantee of this. FreshPorts processes the emails in the order they arrive. It achieves this by putting each email into a queue and processing the queue in order.

This is the ideal workflow:

  1. FreshPorts gets a notice: Hey, there’s been a commit
  2. FreshPorts looks to see how many new commits there and processes each one in order

Step 1 can be as easy as querying the repo manually every minute, or a hook on the repo which taps FreshPorts.

Step 2 Might be challenging, but I am not sure. I don’t know how to say: list me all commits after X. I don’t know how to detect missed commits.

List git commits

Here is a partial list of git commits:

[dan@dev-nginx01:~/www] $ git log
commit 6e21a5fd3a7eeea3ada9896b1b5657a6ba121fd8 (HEAD -> master, origin/master, origin/HEAD)
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:24:51 2019 +0000

    Simplify the deleted ports section of "This port is required by"
    
    Remove the <dl><dd><dt> stuff and keep it straight forward.
    
    Move the "Collapse this list" into the final <li> of the list.

commit 11950339914066ea9298db4fbccc421a1d414108
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:12:29 2019 +0000

    Fix display of long lists
    
    Fixes #126
    
    While here, fix the #hidden part of the "Expand this list (N items / X hidden)" message.

commit <strong>5f0c06c21cb8be3136d7562e12033d39d963d8b3</strong>
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 12:59:35 2019 +0000

    Improve links to validation URLS
    
    * move to https
    * show source on HTML link
    * add referer to CSS link

commit 20c2f1d6619e968db56f42b6632d4ddf6a8d00bb (tag: 1.35)
Author: Dan Langille <dan@langille.org>
Date:   Tue Aug 20 16:19:47 2019 +0000

    Under 'This port is required by:' format deleted ports better
    
    Fixes #125

commit cc188d6ecde7a19c7317ca5477495e1618d70fe9
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 16 19:04:09 2019 +0000

    Add more constants:
    
    * FRESHPORTS_LOG_CACHE_ACTIVITY - log all caching activity
    * PKG_MESSAGE_UCL               - process pkg-message as UCL content

commit 309b10946785ce4254e71b9ebbf116c98095fa53 (tag: 1.34.2)
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 16 18:32:59 2019 +0000

    Comment out some debug stuff.
    Remove fseek, not required.
...
...

The issue: If the last commit processed by FreshPorts is 5f0c06c21cb8be3136d7562e12033d39d963d8b3, how can I get of list of all commit since then?

Google tells me:

[dan@dev-nginx01:~/www] $ git log <strong>5f0c06c21cb8be3136d7562e12033d39d963d8b3</strong>..
commit 6e21a5fd3a7eeea3ada9896b1b5657a6ba121fd8 (HEAD -> master, origin/master, origin/HEAD)
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:24:51 2019 +0000

    Simplify the deleted ports section of "This port is required by"
    
    Remove the <dl><dd><dt> stuff and keep it straight forward.
    
    Move the "Collapse this list" into the final <li> of the list.

commit 11950339914066ea9298db4fbccc421a1d414108
Author: Dan Langille <dan@langille.org>
Date:   Fri Aug 23 15:12:29 2019 +0000

    Fix display of long lists
    
    Fixes #126
    
    While here, fix the #hidden part of the "Expand this list (N items / X hidden)" message.
[dan@dev-nginx01:~/www] $ 

Work is required

Regardless of when git arrives, there will be work to be done. How much work, I don’t know yet.

Jul 132019
 

Today I updated the test website with two changes:

  1. use of dd, dt, and dl tags in the details section of the ports page
  2. Three new graphs:
    1. doc
    2. ports
    3. src

The tags part was all the result of me reading up on them and concluding they could be useful.

The graphs were swills’ fault. They took about an hour to do, an most of that was figuring out the required changes.

I started with www/graphs2.php

Also involved is www/generate_content.php and www/graphs.js, but you can see the whole commit if you want.

Not included in the code are some SQL queries, which were saved in the issue.

Enjoy.

May 252019
 

I’m writing this post just to keep things straight in my head so I can decide how best to resolve this issue.

FreshPorts uses /var/db/freshports/cache/spooling on both the ingress jail and the nginx jail.

The nginx jail uses it for caching content. Page details are first spooled into /var/db/freshports/cache/spooling before moving it to /var/db/freshports/cache/ports.

The ingress jail uses this for refreshing various cached items.

This directory is configured by the FreshPorts-Scripts package, which is installed in both jails.

The problem: this directory is created chown freshports:freshports but it needs to be chown www:freshports in the jail.

My first question is: why does the nginx jail need the FreshPorts-Scripts package? It contains ingress related scripts. By that, I mean scripts related to incoming commits and the code to get them into the FreshPorts database.

How does it get into the jail?

[dan@x8dtu-nginx01:~] $ sudo pkg delete FreshPorts-Scripts
Checking integrity... done (0 conflicting)
Deinstallation has been requested for the following 3 packages (of 0 packages in the universe):

Installed packages to be REMOVED:
	FreshPorts-Scripts-1.1.15
	py27-freshports-fp-listen-1.0.10_3
	freshports-www-1.2.6

Number of packages to be removed: 3

The operation will free 4 MiB.

Proceed with deinstalling packages? [y/N]: n

Two other ports require it.

Ahh, yes, the fp-listen daemon needs the scripts:

[dan@x8dtu-nginx01:~] $ ps auwwx | grep fp-listen
root       35775  0.0  0.0   4244  1944  -  IJ   17:58   0:00.00 supervise fp-listen
freshports 35777  0.0  0.0  21076 16392  -  SJ   17:58   0:00.43 /usr/local/bin/python2.7 /usr/local/lib/python2.7/site-packages/fp-listen/fp-listen.pyc
dan        74034  0.0  0.0   6660  2532  2  S+J  18:57   0:00.00 grep fp-listen
[dan@x8dtu-nginx01:~] $ 

That’s going to be running on nginx regardless. That daemon listens to the PostgreSQL database for updates and clears the relevant portions of on-disk cache.

At first, I was trying to figure out what was installing the www user on the nginx jail. Then I realized, with help, that the www user is installed by default after having been added back in 2001.

It was originally added in 2001.

I see a solution:

  • chown www:freshports
  • chmod 775

That translates to this entry in the pkg-plist file:

@dir(www,freshports,775) %%FP_DATADIR%%/cache/spooling

That seems to fix the rename errors I was seeing:

2019/05/25 18:32:33 [error] 35875#100912: *4277 FastCGI sent in stderr: "PHP message: PHP Warning:  
rename(/tmp/ports.dns.odsclient.Detail.head.PageSize100.PageNum1.html.tmpmuB0Ah,/var/db/freshports/cache/ports/dns/odsclient/Detail.h
ead.PageSize100.PageNum1.html): Operation not permitted in /usr/local/www/freshports/classes/cache.php on line 83" while reading 
response header from upstream, client: 64.233.172.83, server: www.freshports.org, request: "GET /dns/odsclient HTTP/1.1", upstream: 
"fastcgi://unix:/var/run/php-fpm.sock:", host: "www.freshports.org"

Thanks for coming to my TED talk.

Jan 272019
 

Yesterday I copied data from the old production server to the new production server. One thing I missed, but did think about at the time, was updating the sequence used by the table in question. Looking at the table definition:

freshports.org=# \d report_log
                                          Table "public.report_log"
    Column    |           Type           | Collation | Nullable |                  Default                   
--------------+--------------------------+-----------+----------+--------------------------------------------
 id           | integer                  |           | not null | nextval('report_log_id_seq'::regclass)
 report_id    | integer                  |           | not null | 
 frequency_id | integer                  |           |          | 
 report_date  | timestamp with time zone |           | not null | ('now'::text)::timestamp(6) with time zone
 email_count  | integer                  |           | not null | 
 commit_count | integer                  |           | not null | 
 port_count   | integer                  |           | not null | 
Indexes:
    "report_log_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "$1" FOREIGN KEY (frequency_id) REFERENCES report_frequency(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (report_id) REFERENCES reports(id) ON UPDATE CASCADE ON DELETE CASCADE

freshports.org=# 

The report_log_id_seq value will be wrong. When the reports run, they will use values for id which are already present in the table. To confirm, I ran this test:

freshports.org=# BEGIN;
BEGIN
freshports.org=# INSERT INTO report_log (report_id, frequency_id, email_count, commit_count, port_count) VALUES (2, 4, 0, 0, 0);
ERROR:  duplicate key value violates unique constraint "report_log_pkey"
DETAIL:  Key (id)=(19074) already exists.
freshports.org=# ROLLBACK;
ROLLBACK
freshports.org=# SELECT max(id) FROM report_log;
  max  
-------
 20144
(1 row)

freshports.org=# 

Historically, I have done this with setval but today I will try ALTER SEQUENCE.

freshports.org=# BEGIN; ALTER SEQUENCE report_log_id_seq RESTART WITH 20145;
BEGIN
ALTER SEQUENCE
freshports.org=# INSERT INTO report_log (report_id, frequency_id, email_count, commit_count, port_count) VALUES (2, 4, 0, 0, 0);
INSERT 0 1
freshports.org=# ROLLBACK;
ROLLBACK

That worked, so I rolled it back and this time I’ll save the changes without inserting data;

freshports.org=# BEGIN; ALTER SEQUENCE report_log_id_seq RESTART WITH 20145;
BEGIN
ALTER SEQUENCE
freshports.org=# COMMIT;
COMMIT
freshports.org=# 

I remembered this issue while sorting out a configuration & code error this morning.