Dan Langille

I've been playing with Open Source software, starting with FreeBSD, since New Zealand Post installed DSL on my street in 1998. From there, I started writing at The FreeBSD Diary, moving my work here after I discovered WordPress. Along the way, I started the BSDCan and PGCon conferences. I slowly moved from software development into full time systems administration and now work for very-well known company who has been a big force in the security industry.

Sep 052020
 

I have no write up of the jails used by FreshPorts. The following originated in an email I sent this morning.

FreshPorts runs on a FreeBSD host, hosting four jails.

  1. db – the PostgreSQL database server – the source for all content. connections via TCP/IP
  2. mxPostfix takes incoming emails from subversion mailing list via FreeBSD MX – passes them to ingress jails (dev, test, stage, prod – those different environments are not described here)
  3. ingress – Mostly perl, python, shell. Receives email from mx, converts contents into XML, loads XML into the database. All backend reports, notifications, etc are done here. Some data is generated here for the webserver. See note below. Talks to db.
  4. webnginx, php, shell, python. runs the website – talks to db – constructs HTML, caches most of it on local disk. Uses cache when it exists – runs the FreshPorts fp-listen daemon which connects to db and waits for cache-clearing notices.

NOTE: the ingress and web jails share data: ingress creates some HTML for web and supplies it via a nullfs mounted filesystem.

I used jails because:

  • I could
  • kept logical things together
  • forced better design
  • I like jails
  • meant I could put them different servers if I wanted to
Aug 262020
 

The FreshPorts server died on Friday night. I drove up on Sunday morning to retrieve it. I have not yet investigated the breakage. It might be the M/B.

It was suggested on IRC that replace instead of repair might be a better approach. The existing X8DTU m/b has some unresolved IPMI vulnerability issues. A newer single CPU would perform better than the pair of E5620 CPUs.

I’ve also considered going virtual, if someone would donate that.

Supermicro chassis

Supermicro Model 815-5

Supermicro Model 815-5

Supermicro Model 815-5

The dead system had 196GB of RAM, but I think 128GB would be fine. Keeping the database and the main cache files all in RAM (via ZFS ARC) helps throughput.

If you want to see graphs of the dead system, this recent tweet has them.

Aug 252020
 

This is more for my own sanity for the next time I deploy a new FreshPorts host. There is no useful information here for anyone else.

Base this upon the existing host:

svn cp host_vars/foo.example.org host_vars/NEWHOST
svn cp roles/postgresql-server/templates/hosts/foo.example.org roles/postgresql-server/templates/hosts/NEWHOST

Edit these files:

  • host_vars/NEWHOST
  • roles/postgresql-server/templates/hosts/NEWHOST/pg_hba.conf.j2
  • roles/postgresql-server/templates/hosts/NEWHOST/postgresql.conf.j2

Create ssl certs for:

  • NEWHOST.freshports.org
  • NEWHOST.freshsource.org
  • NEWHOST
Aug 162020
 

One of the lesser known FreshPorts features is searching pkg-plist. You want nslookup and dig but you can’t find the package which installs that. The FreshPorts search can find it for you.

When a new commit occurs for a given port, FreshPorts extracts and store the pkg-plist information. This allows you to search.

How does FreshPorts get that data?

make generate-plist

This is the script which runs make generate-plist:

[dan@dev-ingress01:/var/db/freshports/ports-jail] $ cat make-generate-plist.sh
#!/bin/sh
#
# This extracts the config options for a given port.
#
# sudo /usr/sbin/chroot -u USER JAIL /make-generate-plist.sh REPO_PATH PORT
#
# where USER      - user as which to execute the commands.  e.g. dan
#       JAIL      - path to the jail created with the create-jail-directories.sh command. e.g. /usr/jail/FreshPorts
#       REPO_PATH - path to the SVN repository e.g. /usr/local/PORTS-RELENG_9_1_0
#       PORT      - sysutils/bacula-server
#

. ./vars.sh

REPO_PATH=$1
PORT=$2

cd ${REPO_PATH}/${PORT}

${MAKE} generate-plist PORTSDIR=${REPO_PATH} OPTIONSFILE=${LOCALBASE} -f ${REPO_PATH}/${PORT}/Makefile > /dev/null
if [ $? == 0 ]
then
  cat work/.PLIST.mktmp
  ${MAKE} clean > /dev/null
fi
[dan@dev-ingress01:/var/db/freshports/ports-jail] $ 

Let’s try running that from the command line and see what is generated:

$ cd :/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools
$ sudo make generate-plist -f ./Makefile PORTSDIR=../../ 
===>   Generating temporary packing list

Next, let’s look at the file we just created:

$ make -V WRKDIR
/var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work
$ cat /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/.PLIST.mktmp
/usr/local/share/licenses/p5-Mail-Tools-2.21/catalog.mk
/usr/local/share/licenses/p5-Mail-Tools-2.21/LICENSE
/usr/local/share/licenses/p5-Mail-Tools-2.21/ART10
/usr/local/share/licenses/p5-Mail-Tools-2.21/GPLv1+
lib/perl5/site_perl/Mail/Address.pm
lib/perl5/site_perl/Mail/Address.pod
lib/perl5/site_perl/Mail/Cap.pm
lib/perl5/site_perl/Mail/Cap.pod
lib/perl5/site_perl/Mail/Field.pm
lib/perl5/site_perl/Mail/Field.pod
lib/perl5/site_perl/Mail/Field/AddrList.pm
lib/perl5/site_perl/Mail/Field/AddrList.pod
lib/perl5/site_perl/Mail/Field/Date.pm
lib/perl5/site_perl/Mail/Field/Date.pod
lib/perl5/site_perl/Mail/Field/Generic.pm
lib/perl5/site_perl/Mail/Field/Generic.pod
lib/perl5/site_perl/Mail/Filter.pm
lib/perl5/site_perl/Mail/Filter.pod
lib/perl5/site_perl/Mail/Header.pm
lib/perl5/site_perl/Mail/Header.pod
lib/perl5/site_perl/Mail/Internet.pm
lib/perl5/site_perl/Mail/Internet.pod
lib/perl5/site_perl/Mail/Mailer.pm
lib/perl5/site_perl/Mail/Mailer.pod
lib/perl5/site_perl/Mail/Mailer/qmail.pm
lib/perl5/site_perl/Mail/Mailer/rfc822.pm
lib/perl5/site_perl/Mail/Mailer/sendmail.pm
lib/perl5/site_perl/Mail/Mailer/smtp.pm
lib/perl5/site_perl/Mail/Mailer/smtps.pm
lib/perl5/site_perl/Mail/Mailer/testfile.pm
lib/perl5/site_perl/Mail/Send.pm
lib/perl5/site_perl/Mail/Send.pod
lib/perl5/site_perl/Mail/Util.pm
lib/perl5/site_perl/Mail/Util.pod
lib/perl5/site_perl/MailTools.pm
lib/perl5/site_perl/MailTools.pod
lib/perl5/site_perl/man/man3/Mail::Address.3.gz
lib/perl5/site_perl/man/man3/Mail::Cap.3.gz
lib/perl5/site_perl/man/man3/Mail::Field.3.gz
lib/perl5/site_perl/man/man3/Mail::Field::AddrList.3.gz
lib/perl5/site_perl/man/man3/Mail::Field::Date.3.gz
lib/perl5/site_perl/man/man3/Mail::Field::Generic.3.gz
lib/perl5/site_perl/man/man3/Mail::Filter.3.gz
lib/perl5/site_perl/man/man3/Mail::Header.3.gz
lib/perl5/site_perl/man/man3/Mail::Internet.3.gz
lib/perl5/site_perl/man/man3/Mail::Mailer.3.gz
lib/perl5/site_perl/man/man3/Mail::Send.3.gz
lib/perl5/site_perl/man/man3/Mail::Util.3.gz
lib/perl5/site_perl/man/man3/MailTools.3.gz
@owner
@group
@mode

That matches what I see (today) on the webpage for that port.

Clean up:

$ make clean
===>  Cleaning for p5-Mail-Tools-2.21

make makeplist

Let’s try make makeplist:

[freshports@dev-ingress01 ~/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools]$ make makeplist -f ./Makefile PORTSDIR=../../
===>  License ART10 GPLv1+ accepted by the user
===>   p5-Mail-Tools-2.21 depends on file: /usr/local/sbin/pkg - found
=> MailTools-2.21.tar.gz doesn't seem to exist in /var/ports/distfiles/.
=> /var/ports/distfiles/ is not writable by you; cannot fetch.
*** Error code 1

Stop.
make: stopped in /var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools

This required fetching the DISTFILES.

From another ssh session, as another user, I ran this:

$ ls -ld /var/ports/distfiles/
drwxr-xr-x  2 root  wheel  4 Jun 10  2019 /var/ports/distfiles/
$ sudo chown -R freshports:freshports /var/ports/distfiles/

Let’s try that again:

$ make makeplist -f ./Makefile PORTSDIR=../../
===>  License ART10 GPLv1+ accepted by the user
===>   p5-Mail-Tools-2.21 depends on file: /usr/local/sbin/pkg - found
=> MailTools-2.21.tar.gz doesn't seem to exist in /var/ports/distfiles/.
=> Attempting to fetch https://cpan.metacpan.org/modules/by-module/Mail/MailTools-2.21.tar.gz
MailTools-2.21.tar.gz                                   56 kB 2943 kBps    00s
===> Fetching all distfiles required by p5-Mail-Tools-2.21 for building
===>  Extracting for p5-Mail-Tools-2.21
=> SHA256 Checksum OK for MailTools-2.21.tar.gz.
===>  Patching for p5-Mail-Tools-2.21
===>   p5-Mail-Tools-2.21 depends on package: p5-TimeDate>=0 - found
===>   p5-Mail-Tools-2.21 depends on package: perl5>=5.32.r0<5.33 - found
===>   p5-Mail-Tools-2.21 depends on package: perl5>=5.32.r0<5.33 - found
===>  Configuring for p5-Mail-Tools-2.21
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for MailTools
Writing MYMETA.yml and MYMETA.json
===>  Building for p5-Mail-Tools-2.21
--- blib/lib/.exists ---
--- blib/arch/.exists ---
--- blib/lib/auto/MailTools/.exists ---
--- blib/arch/auto/MailTools/.exists ---
--- blib/bin/.exists ---
--- blib/script/.exists ---
--- blib/man1/.exists ---
--- blib/man3/.exists ---
--- config ---
--- subdirs ---
--- dynamic ---
--- ppd ---
--- pm_to_blib ---
--- blibdirs ---
--- config ---
--- dynamic ---
--- linkext ---
--- pm_to_blib ---
cp lib/Mail/Cap.pm blib/lib/Mail/Cap.pm
cp lib/Mail/Send.pm blib/lib/Mail/Send.pm
cp lib/Mail/Filter.pm blib/lib/Mail/Filter.pm
cp lib/Mail/Field/AddrList.pod blib/lib/Mail/Field/AddrList.pod
cp lib/Mail/Cap.pod blib/lib/Mail/Cap.pod
cp lib/Mail/Field/Generic.pod blib/lib/Mail/Field/Generic.pod
cp lib/MailTools.pm blib/lib/MailTools.pm
cp lib/Mail/Mailer/qmail.pm blib/lib/Mail/Mailer/qmail.pm
cp lib/Mail/Address.pod blib/lib/Mail/Address.pod
cp lib/Mail/Util.pod blib/lib/Mail/Util.pod
cp lib/Mail/Mailer/smtp.pm blib/lib/Mail/Mailer/smtp.pm
cp lib/Mail/Header.pm blib/lib/Mail/Header.pm
cp lib/Mail/Field.pm blib/lib/Mail/Field.pm
cp lib/Mail/Filter.pod blib/lib/Mail/Filter.pod
cp lib/Mail/Field.pod blib/lib/Mail/Field.pod
cp lib/Mail/Mailer/sendmail.pm blib/lib/Mail/Mailer/sendmail.pm
cp lib/Mail/Mailer/rfc822.pm blib/lib/Mail/Mailer/rfc822.pm
cp lib/Mail/Field/Date.pm blib/lib/Mail/Field/Date.pm
cp lib/Mail/Header.pod blib/lib/Mail/Header.pod
cp lib/Mail/Field/Date.pod blib/lib/Mail/Field/Date.pod
cp lib/Mail/Internet.pod blib/lib/Mail/Internet.pod
cp lib/Mail/Mailer.pod blib/lib/Mail/Mailer.pod
cp lib/Mail/Field/AddrList.pm blib/lib/Mail/Field/AddrList.pm
cp lib/Mail/Mailer/smtps.pm blib/lib/Mail/Mailer/smtps.pm
cp lib/Mail/Field/Generic.pm blib/lib/Mail/Field/Generic.pm
cp lib/Mail/Internet.pm blib/lib/Mail/Internet.pm
cp lib/Mail/Address.pm blib/lib/Mail/Address.pm
cp lib/Mail/Mailer/testfile.pm blib/lib/Mail/Mailer/testfile.pm
cp lib/Mail/Util.pm blib/lib/Mail/Util.pm
cp lib/Mail/Mailer.pm blib/lib/Mail/Mailer.pm
cp lib/Mail/Send.pod blib/lib/Mail/Send.pod
cp lib/MailTools.pod blib/lib/MailTools.pod
--- pure_all ---
--- manifypods ---
Manifying 13 pod documents
--- all ---
===>  Staging for p5-Mail-Tools-2.21
===>   p5-Mail-Tools-2.21 depends on package: p5-TimeDate>=0 - found
===>   p5-Mail-Tools-2.21 depends on package: perl5>=5.32.r0<5.33 - found
===>   Generating temporary packing list
Manifying 13 pod documents
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/MailTools.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/MailTools.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Address.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Filter.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Cap.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Internet.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Util.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Internet.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Send.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Header.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Header.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Cap.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Send.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Address.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Filter.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Util.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field/Date.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field/Generic.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field/Generic.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field/AddrList.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field/Date.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Field/AddrList.pod
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer/qmail.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer/smtps.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer/sendmail.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer/testfile.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer/rfc822.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/Mail/Mailer/smtp.pm
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Send.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Field::Generic.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Util.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Field.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Address.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Field::Date.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Mailer.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Cap.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Internet.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Header.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/MailTools.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Filter.3
Installing /var/ports/var/db/freshports/ports-jail/var/db/repos/PORTS-head/mail/p5-Mail-Tools/work/stage/usr/local/lib/perl5/site_perl/man/man3/Mail::Field::AddrList.3
====> Compressing man pages (compress-man)
/you/have/to/check/what/makeplist/gives/you
%%SITE_PERL%%/Mail/Address.pm
%%SITE_PERL%%/Mail/Address.pod
%%SITE_PERL%%/Mail/Cap.pm
%%SITE_PERL%%/Mail/Cap.pod
%%SITE_PERL%%/Mail/Field.pm
%%SITE_PERL%%/Mail/Field.pod
%%SITE_PERL%%/Mail/Field/AddrList.pm
%%SITE_PERL%%/Mail/Field/AddrList.pod
%%SITE_PERL%%/Mail/Field/Date.pm
%%SITE_PERL%%/Mail/Field/Date.pod
%%SITE_PERL%%/Mail/Field/Generic.pm
%%SITE_PERL%%/Mail/Field/Generic.pod
%%SITE_PERL%%/Mail/Filter.pm
%%SITE_PERL%%/Mail/Filter.pod
%%SITE_PERL%%/Mail/Header.pm
%%SITE_PERL%%/Mail/Header.pod
%%SITE_PERL%%/Mail/Internet.pm
%%SITE_PERL%%/Mail/Internet.pod
%%SITE_PERL%%/Mail/Mailer.pm
%%SITE_PERL%%/Mail/Mailer.pod
%%SITE_PERL%%/Mail/Mailer/qmail.pm
%%SITE_PERL%%/Mail/Mailer/rfc822.pm
%%SITE_PERL%%/Mail/Mailer/sendmail.pm
%%SITE_PERL%%/Mail/Mailer/smtp.pm
%%SITE_PERL%%/Mail/Mailer/smtps.pm
%%SITE_PERL%%/Mail/Mailer/testfile.pm
%%SITE_PERL%%/Mail/Send.pm
%%SITE_PERL%%/Mail/Send.pod
%%SITE_PERL%%/Mail/Util.pm
%%SITE_PERL%%/Mail/Util.pod
%%SITE_PERL%%/MailTools.pm
%%SITE_PERL%%/MailTools.pod
%%SITE_ARCH%%/auto/MailTools/.packlist
%%PERL5_MAN3%%/Mail::Address.3.gz
%%PERL5_MAN3%%/Mail::Cap.3.gz
%%PERL5_MAN3%%/Mail::Field.3.gz
%%PERL5_MAN3%%/Mail::Field::AddrList.3.gz
%%PERL5_MAN3%%/Mail::Field::Date.3.gz
%%PERL5_MAN3%%/Mail::Field::Generic.3.gz
%%PERL5_MAN3%%/Mail::Filter.3.gz
%%PERL5_MAN3%%/Mail::Header.3.gz
%%PERL5_MAN3%%/Mail::Internet.3.gz
%%PERL5_MAN3%%/Mail::Mailer.3.gz
%%PERL5_MAN3%%/Mail::Send.3.gz
%%PERL5_MAN3%%/Mail::Util.3.gz
%%PERL5_MAN3%%/MailTools.3.gz
$ 

That list is arguably better than the first list. Here are the issues I see:

  1. Variables (e.g. %%SITE_PERL%%) are better than actual pathnames because that shows you where things might go, as opposed to the current default configuration.
  2. Full pathnames are better than variables because you’re looking for an actual file – non developers will want this.
  3. It is easier to get the output from make generate-plist than make makeplist.
  4. The output of make generate-plist is more comprehensive (e.g. it includes catalog.mk
  5. make makeplist requires the DISTFILES

Hmm.

Aug 152020
 

A few days ago, I did the database work to ignore case when browsing to a cat/port. Tonight, I did the website side.

database access

First, I altered the database function (in classes/element_record.php) to use the new stored procedure:

-       function FetchByName($Name) {
+       function FetchByName($Name, $caseSensitive = true) {
+               $Debug = 0;
+
+               if ($Debug) echo "looking for '$Name' and caseSensitive is '$caseSensitive'<br>";
                if (IsSet($Name)) {
                        $this->element_pathname = $Name;
                        $this->id = '';
                }
-               $sql = "select * from elementGet('" . pg_escape_string($Name) . "')";
 
-               $result = pg_exec($this->dbh, $sql);
+               if ($caseSensitive) {
+                       $sql = "select * from elementGet('" . pg_escape_string($Name) . "')";
+                       if ($Debug) echo "invoking $sql<br>";
+                       $result = pg_exec($this->dbh, $sql);
+               } else {
+                       $result = pg_query_params($this->dbh, 'select * from elementGetCaseInsensitive($1)', array($Name));
+               }
+
                if ($result) {
+                       if ($Debug) echo "we got a result<br>";
+
                        $numrows = pg_numrows($result);
+                       if ($Debug) echo "we have '$numrows' rows<br>";
                        if ($numrows == 1) {
                                $myrow = pg_fetch_array ($result, 0);
                                $this->PopulateValues($myrow);

On line one, I modify the function to pass in an optional parameter which preserves the original behavior.

Line 10 is the original code.

Lines 13-20 invokes the elementGetCaseInsensitive() stored procedure to work on a case insensitive solution.

Redirecting when finding a different match

When getting a result, we check to see if the path differs. If we get a result, we know it matches, but we need to know if it’s a different case. Here is that code (from rewrite/functions.php):

-       if ($ElementRecord->FetchByName('/ports/head/' . $pathname)) {
+       if ($ElementRecord->FetchByName(FRESHPORTS_PORTS_TREE_PREFIX . PATH_NAME, 0)) {
                $IsElement = true;
                if ($Debug) echo 'we found an element for that<br>';
+               if ($Debug) echo "we have: '$ElementRecord->element_pathname'<br>";
+               if ($Debug) echo " we had: '" . FRESHPORTS_PORTS_TREE_PREFIX . PATH_NAME . "'<br>";
+               if (PathnameDiffers($ElementRecord->element_pathname . '/', FRESHPORTS_PORTS_TREE_PREFIX . PATH_NAME)) {
+                       # in a case insensitive search, we want to redirect if the case was wrong
+                       if ($Debug) echo "we are redirecting to '" . $ElementRecord->element_pathname . "/'<br>";
+                       if ($Debug) echo 'which normalizes to ' . str_replace(FRESHPORTS_PORTS_TREE_PREFIX, '/', $ElementRecord->element_pathname . '/<br>');
+                       $https = ((!empty($_SERVER['HTTPS'])) && ($_SERVER['HTTPS'] != 'off'));
+                       if ($https) {
+                               $protocol = "https";
+                       } else {
+                               $protocol = "http";
+                       }
+
+                       header("HTTP/1.1 301 Moved Permanently");
+                       header('Location: ' . $protocol . '://' . $_SERVER['HTTP_HOST'] . str_replace(FRESHPORTS_PORTS_TREE_PREFIX, '/', $ElementRecord->element_pathname . '/'));
+                       exit;
+               }
+       }
+

Line 7 invokes PathDiffers to do a comparison between the two strings. This is not a simple A != B operation. sysutils/bacula9-server is not the same as sysutils/bacula9-server/ despite them showing the same content on the website. The paths, or URIs, must first be ‘normalized’ (that’s my chosen terminology, not anything official).

If the user has included a trailing slash (/) on their URL, the code will strip that before looking for a matching entry in the element table.

The debugging information shows some of this:

the URI is
'/sysutils/bacula9-SERVER/'

the url parts are
array(1) {
  ["path"]=>
  string(25) "/sysutils/bacula9-SERVER/"
}

The pathname is '/sysutils/bacula9-SERVER/'
we found an element for that
we have: '/ports/head/sysutils/bacula9-server'
we had: '/ports/head/sysutils/bacula9-SERVER'
we are redirecting to '/ports/head/sysutils/bacula9-server/'
which normalizes to /sysutils/bacula9-server/

The function in question looks like this:

function PathnameDiffers($Path1, $Path2) {
    # if the two paths are different, we might want to redirect
    # if one path ends in a / and the other does not, adjust them
    if (substr($Path1, -1) == '/') {
        if (substr($Path2, -1) != '/') {
            $Path2 .= '/';
        }
    } else {
        if (substr($Path2, -1) == '/') {
            $Path1 .= '/';
        }
    }

    return $Path1 != $Path2;
}

That redirect happens early enough in the code to handle all elements, not just ports (e.g. cat/ports/Makefile or distinfo or pkg-descr).

Aug 112020
 

This idea came to me today from swills:

stupid question, would it be possible to make the urls on freshports case insensitive?

so that like https://www.freshports.org/x11/libx11 would go to https://www.freshports.org/x11/libX11

After confirming this was for manually typed URLs (as opposed to generated links), I started looking into it. This is what I found.

In this post, we’re going to look at some old PHP database code and the PostgreSQL stored procedures behind them. I’ll show you the multi-line stored procedure now in use and how a single query will now do the same thing. This change isn’t some clever coding or another solution. The database has evolved and the procedures did not have to. Until now.

Going from sysutils/py-mqttwarn to a database entry

I will base all my links to code on the 1.50.5 release.

If the Nginx webserver cannot follow the URL to a file on disk, it invokes www/–/index.php via this bit of declartion:

  location / {
    try_files $uri $uri/ @missing;
  }

  location @missing {
    rewrite ^ /--/index.php break;

    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass unix:/var/run/php-fpm.sock;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $request_filename;
    include fastcgi_params;
  }

Line 2 says: try finding the URL using files on disk and if you file, go do what @missing points to.

Line 6 say: invoke /–/index.php for things you don’t find in a file.

Why /–/index.php? I wanted to put this file somewhere that it would be highly unlikely to collide with anything in the FreeBSD repo (with regards to directories and file names).

The code checks for other special stuff ((badges, scripts, API, etc) before invoking freshports_Parse404URI() on line 148. I will discuss that soon, but first: if nothing else is picked up by /–/index.php, then it’s time to invoke rewrite/missing.php

freshports_Parse404URI() is declared in rewrite/functions.php and the code relevant to our journey is on line 70:

if ($ElementRecord->FetchByName('/ports/head/' . $pathname)) {

Ignoring that hardcoded pathname for now, this function gets passed something like this:

/ports/head/sysutils/py-mqttwarn

There might be more on the path: branch name, commit number, etc.

FetchByName() is located on line 38 of classes/element_record

Now we are getting into the oldest code of FreshPorts. I’m sure this stuff hasn’t changed in over 15 years.

The database function elementGet() is invoked which looks like this:

CREATE OR REPLACE FUNCTION elementGet (text) RETURNS SETOF element_type AS $$
   SELECT id,
          name::text,
          directory_file_flag::text,
          status::text,
          case when IsCategory(Pathname_ID($1)) IS NULL THEN FALSE ELSE TRUE END,
          case when IsPort(    Pathname_ID($1)) IS NULL THEN FALSE ELSE TRUE END,
          element_pathname(id)
     FROM element
    WHERE id = PathName_ID($1);
$$ LANGUAGE SQL STABLE;

The key in this function is PathName_ID() which is used three times and uniquely identifies a single row in the element table.

Element table

The element table is an implementation of a filesystem using a self-referential solution (i.e. it is recursive).

The table (without constraints, references, and triggers) looks like this:

freshports.dev=# \d element
                                     Table "public.element"
       Column        |     Type     | Collation | Nullable |               Default               
---------------------+--------------+-----------+----------+-------------------------------------
 id                  | integer      |           | not null | nextval('element_id_seq'::regclass)
 name                | text         |           | not null | 
 parent_id           | integer      |           |          | 
 directory_file_flag | character(1) |           | not null | 
 status              | character(1) |           | not null | 
Indexes:
    "element_pkey" PRIMARY KEY, btree (id)
    "element_delete" btree (status) WHERE status = 'D'::bpchar
    "element_name" btree (name)
    "element_parent_id" btree (parent_id)

An entry in the top level directory will have parent_id = NULL.

Pathname_ID function

Pathname_ID() is a function which progresses along a supplied pathname from left to right following the each entry from the element table.

CREATE OR REPLACE FUNCTION Pathname_ID(text) returns int4 AS $$

   DECLARE
      in_element_pathname ALIAS FOR $1;

      debug_str         text;
      element_id        int4;
      element_parent_id int4;
      element_name      text;
      element_pathname  text;
      position          int4;
      slash             text;

begin
  slash            := '/';
  element_pathname := in_element_pathname;
  element_id       := -1;

  IF substr(element_pathname,1,1) = slash THEN
    element_pathname := substr(element_pathname, 2);
  END IF;

  position := strpos(element_pathname, slash);
  IF position = 0 THEN
    element_name     := element_pathname;
    element_pathname := '';
  ELSE
    element_name     := substr(element_pathname, 1, position-1);
    element_pathname := substr(element_pathname, position+1);
  END IF;

  select id 
    into element_parent_id
    from element 
   where element.name = element_name
     and parent_id is null;

  debug_str := debug_str || '1 = ' || element_name;

  WHILE(char_length(element_pathname) > 0) LOOP
    position := strpos(element_pathname, slash);
    if position = 0 then
       element_name     := element_pathname;
       element_pathname := '';
    else
       element_name     := substr(element_pathname, 1, position-1);
       element_pathname := substr(element_pathname, position+1);
    end if;

    debug_str := debug_str || '1 = ' || element_name;
    select id 
      into element_parent_id
      from element 
     where element.name = element_name 
       and parent_id    = element_parent_id;

    debug_str := debug_str || ' 2 = ' || element_name;
    debug_str := debug_str || ' 3 = ' || element_parent_id;
  end loop;

  return element_parent_id;
END;
$$  LANGUAGE 'plpgsql' STABLE;

The key lines are 35 and 54. To meet the goal of a case insensitive match, those operators would need to be changed from = to a case insensitive comparison, such as:

where lower(element.name) = lower(element_name)

I do not want to alter this code which is so far down the tree. Instead, I will create a new function which will take a different approach. It will also take advantage of new database features which did not exist 8 years ago, let along 15+ years ago.

Testing new queries

I did some playing around with queries today, looking to find the element_id for a given port, in our example sysutils/py-mqttwarn.

I started off with this, show me things named x11:

freshports.stage=# select *, element_pathname(id) from element where lower(name) = lower('X11');
   id    | name | parent_id | directory_file_flag | status |                    element_pathname                    
---------+------+-----------+---------------------+--------+--------------------------------------------------------
   43426 | x11  |     33542 | F                   | A      | /ports/head/misc/kdeutils11/x11
   52955 | x11  |      4760 | F                   | A      | /ports/head/misc/kdeutils2/x11
   58887 | x11  |     50378 | F                   | A      | /ports/head/textproc/xml-i18n-tools/x11
   58962 | x11  |      3582 | F                   | A      | /ports/head/www/gtkhtml/x11
   83338 | x11  |     77344 | D                   | A      | /doc/head/en_US.ISO8859-1/books/handbook/x11
  104943 | x11  |    104814 | D                   | A      | /doc/head/fr_FR.ISO8859-1/books/handbook/x11
   99879 | x11  |     86040 | D                   | A      | /doc/head/ja_JP.eucJP/books/handbook/x11
  200645 | x11  |    200550 | D                   | A      | /doc/head/nl_NL.ISO8859-1/books/handbook/x11
  110618 | x11  |     77691 | D                   | A      | /doc/head/de_DE.ISO8859-1/books/handbook/x11
  112293 | x11  |    112148 | D                   | A      | /doc/head/it_IT.ISO8859-15/books/handbook/x11
  113146 | X11  |     77283 | D                   | A      | /base/head/release/scripts/X11
  200746 | x11  |    200655 | D                   | A      | /doc/head/zh_CN.GB2312/books/handbook/x11
  171234 | x11  |    110131 | D                   | A      | /doc/head/ru_RU.KOI8-R/books/handbook/x11
  271292 | x11  |    271177 | D                   | A      | /doc/head/mn_MN.UTF-8/books/handbook/x11
  200939 | x11  |    110120 | D                   | A      | /doc/head/es_ES.ISO8859-1/books/handbook/x11
  234867 | x11  |    234757 | D                   | A      | /doc/head/zh_TW.Big5/books/handbook/x11
  255132 | x11  |    255022 | D                   | A      | /doc/head/pl_PL.ISO8859-2/books/handbook/x11
  288692 | x11  |    288608 | D                   | A      | /doc/head/el_GR.ISO8859-7/books/handbook/x11
  300947 | x11  |    300813 | D                   | A      | /doc/head/hu_HU.ISO8859-2/books/handbook/x11
  433144 | x11  |    433041 | D                   | A      | /ru_RU.KOI8-R/books/handbook/x11
  433549 | x11  |    433544 | D                   | A      | /mn_MN.UTF-8/books/handbook/x11
  434309 | x11  |    433663 | D                   | A      | /ja_JP.eucJP/books/handbook/x11
  439412 | x11  |    432543 | D                   | A      | /en_US.ISO8859-1/books/handbook/x11
  439459 | x11  |    433298 | D                   | A      | /nl_NL.ISO8859-1/books/handbook/x11
  457198 | x11  |    111733 | D                   | A      | /doc/head/pt_BR.ISO8859-1/books/handbook/x11
     231 | x11  |    464087 | D                   | A      | /ports/head/x11
  465674 | x11  |    465603 | D                   | A      | /doc/translations/nl_NL.ISO8859-1/books/handbook/x11
  501886 | x11  |    465481 | D                   | A      | /doc/translations/en_US.ISO8859-1/books/handbook/x11
  537516 | x11  |    537439 | D                   | A      | /doc/head/zh_CN.UTF-8/books/handbook/x11
  663947 | x11  |    663946 | D                   | A      | /doc/release/10.2.0/en_US.ISO8859-1/books/handbook/x11
  587542 | x11  |    587471 | D                   | D      | /doc/head/zh_TW.UTF-8/books/handbook/x11
  745067 | x11  |    743628 | D                   | A      | /ports/branches/2016Q4/x11
  770464 | x11  |    763349 | D                   | A      | /ports/branches/2017Q1/x11
  792524 | x11  |    790218 | D                   | A      | /ports/branches/2017Q2/x11
  812565 | x11  |    812271 | D                   | A      | /ports/branches/2017Q3/x11
  829995 | x11  |    829562 | D                   | A      | /ports/branches/2017Q4/x11
  870434 | x11  |    867105 | D                   | A      | /ports/branches/2018Q2/x11
  882372 | x11  |    881009 | D                   | A      | /ports/branches/2018Q3/x11
  901665 | x11  |    901631 | D                   | A      | /ports/branches/2018Q4/x11
  914724 | x11  |    914354 | D                   | A      | /ports/branches/2019Q1/x11
  936002 | x11  |    935538 | D                   | A      | /ports/branches/2019Q2/x11
  954616 | x11  |    949697 | D                   | A      | /ports/branches/2019Q3/x11
  971674 | x11  |    961357 | D                   | A      | /ports/branches/2019Q4/x11
  978479 | x11  |    978429 | D                   | A      | /ports/branches/2020Q1/x11
 1014222 | x11  |   1012899 | D                   | A      | /ports/branches/2020Q2/x11
 1055981 | x11  |   1053714 | D                   | A      | /ports/branches/2020Q3/x11
(46 rows)

Based on line 29, show me a child of that element named libX11:

freshports.stage=# select *, element_pathname(id) from element where lower(name) = lower('libX11') and parent_id = 231;
   id   |  name  | parent_id | directory_file_flag | status |    element_pathname    
--------+--------+-----------+---------------------+--------+------------------------
 163623 | libX11 |       231 | D                   | A      | /ports/head/x11/libX11
(1 row)

freshports.stage=# 

That’s the one.

Using this approach, I came up with few queries, which I later abandoned for another solution.

The abandoned queries

I tried this join of four tables:

freshports.stage=# explain analyse
freshports.stage-# SELECT E4.id, element_pathname(E4.id)
freshports.stage-#   FROM element E1, element E2, element E3, element E4
freshports.stage-#  WHERE E1.parent_id IS NULL
freshports.stage-#    AND E2.parent_id = E1.id
freshports.stage-#    AND E3.parent_id = E2.id
freshports.stage-#    AND E4.parent_id = E3.id
freshports.stage-#    AND E2.name = 'head'
freshports.stage-#    AND E1.name = 'ports'
freshports.stage-#    AND lower(E3.name) = lower('sysutils')
freshports.stage-#    AND lower(E4.name) = lower('py-mqttwarn');
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=20.05..74.95 rows=1 width=36) (actual time=1.833..2.299 rows=1 loops=1)
   ->  Nested Loop  (cost=19.62..67.62 rows=1 width=4) (actual time=0.126..0.138 rows=1 loops=1)
         ->  Nested Loop  (cost=19.19..27.23 rows=1 width=4) (actual time=0.061..0.063 rows=1 loops=1)
               ->  Bitmap Heap Scan on element e1  (cost=9.76..13.77 rows=1 width=4) (actual time=0.034..0.035 rows=1 loops=1)
                     Recheck Cond: ((name = 'ports'::text) AND (parent_id IS NULL))
                     Heap Blocks: exact=1
                     ->  BitmapAnd  (cost=9.76..9.76 rows=1 width=0) (actual time=0.030..0.030 rows=0 loops=1)
                           ->  Bitmap Index Scan on element_name  (cost=0.00..4.56 rows=17 width=0) (actual time=0.016..0.016 rows=38 loops=1)
                                 Index Cond: (name = 'ports'::text)
                           ->  Bitmap Index Scan on element_parent_id  (cost=0.00..4.95 rows=70 width=0) (actual time=0.010..0.010 rows=100 loops=1)
                                 Index Cond: (parent_id IS NULL)
               ->  Bitmap Heap Scan on element e2  (cost=9.43..13.45 rows=1 width=8) (actual time=0.023..0.024 rows=1 loops=1)
                     Recheck Cond: ((name = 'head'::text) AND (parent_id = e1.id))
                     Heap Blocks: exact=1
                     ->  BitmapAnd  (cost=9.43..9.43 rows=1 width=0) (actual time=0.020..0.020 rows=0 loops=1)
                           ->  Bitmap Index Scan on element_name  (cost=0.00..4.56 rows=17 width=0) (actual time=0.009..0.009 rows=15 loops=1)
                                 Index Cond: (name = 'head'::text)
                           ->  Bitmap Index Scan on element_parent_id  (cost=0.00..4.62 rows=26 width=0) (actual time=0.008..0.008 rows=5 loops=1)
                                 Index Cond: (parent_id = e1.id)
         ->  Index Scan using element_parent_id on element e3  (cost=0.43..40.38 rows=1 width=8) (actual time=0.064..0.074 rows=1 loops=1)
               Index Cond: (parent_id = e2.id)
               Filter: (lower(name) = 'sysutils'::text)
               Rows Removed by Filter: 88
   ->  Index Scan using element_parent_id on element e4  (cost=0.43..7.07 rows=1 width=8) (actual time=1.540..1.994 rows=1 loops=1)
         Index Cond: (parent_id = e3.id)
         Filter: (lower(name) = 'py-mqttwarn'::text)
         Rows Removed by Filter: 2380
 Planning Time: 0.689 ms
 Execution Time: 2.352 ms
(29 rows)

freshports.stage=# 

I tried rewriting with a JOIN JOIN approach, but that’s really just the previous query:

freshports.stage=# -- find a port which matches this
freshports.stage=# -- case insensitive
freshports.stage=# 
freshports.stage=# 
freshports.stage=# explain analyse
freshports.stage-# SELECT E4.id, element_pathname(E4.id)
freshports.stage-#   FROM element E1
freshports.stage-#   JOIN element E2 ON E1.parent_id IS NULL AND E1.name = 'ports'
freshports.stage-#                  AND E2.parent_id = E1.id AND E2.name = 'head'
freshports.stage-#   JOIN element E3 ON E3.parent_id = E2.id AND lower(E3.name) = lower('sysutils')
freshports.stage-#   JOIN element E4 ON E4.parent_id = E3.id AND lower(E4.name) = lower('bacula9-server');
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=20.05..74.95 rows=1 width=36) (actual time=3.729..4.788 rows=1 loops=1)
   ->  Nested Loop  (cost=19.62..67.62 rows=1 width=4) (actual time=0.278..0.309 rows=1 loops=1)
         ->  Nested Loop  (cost=19.19..27.23 rows=1 width=4) (actual time=0.129..0.132 rows=1 loops=1)
               ->  Bitmap Heap Scan on element e1  (cost=9.76..13.77 rows=1 width=4) (actual time=0.072..0.073 rows=1 loops=1)
                     Recheck Cond: ((name = 'ports'::text) AND (parent_id IS NULL))
                     Heap Blocks: exact=1
                     ->  BitmapAnd  (cost=9.76..9.76 rows=1 width=0) (actual time=0.063..0.063 rows=0 loops=1)
                           ->  Bitmap Index Scan on element_name  (cost=0.00..4.56 rows=17 width=0) (actual time=0.032..0.033 rows=38 loops=1)
                                 Index Cond: (name = 'ports'::text)
                           ->  Bitmap Index Scan on element_parent_id  (cost=0.00..4.95 rows=70 width=0) (actual time=0.023..0.023 rows=100 loops=1)
                                 Index Cond: (parent_id IS NULL)
               ->  Bitmap Heap Scan on element e2  (cost=9.43..13.45 rows=1 width=8) (actual time=0.047..0.049 rows=1 loops=1)
                     Recheck Cond: ((name = 'head'::text) AND (parent_id = e1.id))
                     Heap Blocks: exact=1
                     ->  BitmapAnd  (cost=9.43..9.43 rows=1 width=0) (actual time=0.041..0.041 rows=0 loops=1)
                           ->  Bitmap Index Scan on element_name  (cost=0.00..4.56 rows=17 width=0) (actual time=0.019..0.019 rows=15 loops=1)
                                 Index Cond: (name = 'head'::text)
                           ->  Bitmap Index Scan on element_parent_id  (cost=0.00..4.62 rows=26 width=0) (actual time=0.016..0.016 rows=5 loops=1)
                                 Index Cond: (parent_id = e1.id)
         ->  Index Scan using element_parent_id on element e3  (cost=0.43..40.38 rows=1 width=8) (actual time=0.147..0.173 rows=1 loops=1)
               Index Cond: (parent_id = e2.id)
               Filter: (lower(name) = 'sysutils'::text)
               Rows Removed by Filter: 88
   ->  Index Scan using element_parent_id on element e4  (cost=0.43..7.07 rows=1 width=8) (actual time=3.097..4.125 rows=1 loops=1)
         Index Cond: (parent_id = e3.id)
         Filter: (lower(name) = 'bacula9-server'::text)
         Rows Removed by Filter: 2380
 Planning Time: 1.608 ms
 Execution Time: 4.898 ms
(29 rows)

The next query involves passing the parent id down to the next level:

freshports.stage=# -- find a port which matches this
freshports.stage=# -- case insensitive
freshports.stage=# 
freshports.stage=# explain analyse
freshports.stage-# SELECT E4.ID, element_pathname(E4.id)
freshports.stage-#   FROM element E4
freshports.stage-#  WHERE lower(E4.name) = lower('py37-mqttwarn') AND E4.parent_id = (
freshports.stage(# SELECT E3.ID
freshports.stage(#   FROM element E3
freshports.stage(#  WHERE lower(E3.name) = lower('sysutils') AND E3.parent_id = (
freshports.stage(# SELECT E2.ID
freshports.stage(#   FROM element E2
freshports.stage(#  WHERE E2.name = 'head' AND E2.parent_id = (
freshports.stage(# SELECT E1.id
freshports.stage(#   FROM element E1 
freshports.stage(#  WHERE E1.parent_id IS NULL AND E1.name = 'ports')));
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using element_parent_id on element e4  (cost=68.93..110.05 rows=1 width=36) (actual time=1.902..1.902 rows=0 loops=1)
   Index Cond: (parent_id = $2)
   Filter: (lower(name) = 'py37-mqttwarn'::text)
   Rows Removed by Filter: 2381
   InitPlan 3 (returns $2)
     ->  Index Scan using element_parent_id on element e3  (cost=27.64..68.51 rows=1 width=4) (actual time=0.121..0.134 rows=1 loops=1)
           Index Cond: (parent_id = $1)
           Filter: (lower(name) = 'sysutils'::text)
           Rows Removed by Filter: 88
           InitPlan 2 (returns $1)
             ->  Bitmap Heap Scan on element e2  (cost=23.20..27.21 rows=1 width=4) (actual time=0.061..0.061 rows=1 loops=1)
                   Recheck Cond: ((name = 'head'::text) AND (parent_id = $0))
                   Heap Blocks: exact=1
                   InitPlan 1 (returns $0)
                     ->  Bitmap Heap Scan on element e1  (cost=9.76..13.77 rows=1 width=4) (actual time=0.032..0.032 rows=1 loops=1)
                           Recheck Cond: ((name = 'ports'::text) AND (parent_id IS NULL))
                           Heap Blocks: exact=1
                           ->  BitmapAnd  (cost=9.76..9.76 rows=1 width=0) (actual time=0.027..0.027 rows=0 loops=1)
                                 ->  Bitmap Index Scan on element_name  (cost=0.00..4.56 rows=17 width=0) (actual time=0.011..0.011 rows=38 loops=1)
                                       Index Cond: (name = 'ports'::text)
                                 ->  Bitmap Index Scan on element_parent_id  (cost=0.00..4.95 rows=70 width=0) (actual time=0.013..0.013 rows=100 loops=1)
                                       Index Cond: (parent_id IS NULL)
                   ->  BitmapAnd  (cost=9.43..9.43 rows=1 width=0) (actual time=0.058..0.058 rows=0 loops=1)
                         ->  Bitmap Index Scan on element_name  (cost=0.00..4.56 rows=17 width=0) (actual time=0.016..0.016 rows=15 loops=1)
                               Index Cond: (name = 'head'::text)
                         ->  Bitmap Index Scan on element_parent_id  (cost=0.00..4.62 rows=26 width=0) (actual time=0.040..0.040 rows=5 loops=1)
                               Index Cond: (parent_id = $0)
 Planning Time: 0.235 ms
 Execution Time: 1.945 ms

The chosen query

The chosen query uses a relatively new table: element_pathname

freshports.stage=# \d element_pathname
            Table "public.element_pathname"
   Column   |  Type   | Collation | Nullable | Default 
------------+---------+-----------+----------+---------
 element_id | integer |           | not null | 
 pathname   | text    |           | not null | 
Indexes:
    "element_pathname_pathname" UNIQUE, btree (pathname)
    "element_pathname_element_id" btree (element_id)
Foreign-key constraints:
    "element_pathname_element_id_fkey" FOREIGN KEY (element_id) REFERENCES element(id) ON DELETE CASCADE

freshports.stage=# 

This table is maintained via triggers on the element table.

The same query here is:

freshports.stage=# explain analyse select *
FROM element_pathname where lower(pathname) = '/ports/head/sysutils/py-mqttwarn';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..19555.74 rows=5361 width=54) (actual time=158.979..160.785 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on element_pathname  (cost=0.00..18019.64 rows=2234 width=54) (actual time=145.503..156.289 rows=0 loops=3)
         Filter: (lower(pathname) = '/ports/head/sysutils/py-mqttwarn'::text)
         Rows Removed by Filter: 351214
 Planning Time: 0.483 ms
 Execution Time: 160.830 ms
(8 rows)

freshports.stage=#

Our query will always be on lower case, so let’s create an index on lower case. I am curious about the size of this index, and it added only 77MB to the database.

freshports.stage=# select pg_size_pretty(pg_indexes_size('element_pathname'));
 pg_size_pretty 
----------------
 107 MB
(1 row)

freshports.stage=# create index element_pathname_pathname_lc on element_pathname ((lower(pathname)));
CREATE INDEX
freshports.stage=# select pg_size_pretty(pg_indexes_size('element_pathname'));
 pg_size_pretty 
----------------
 184 MB
(1 row)

freshports.stage=# select count(*) from element_pathname;
  count  
---------
 1053643
(1 row)

freshports.stage=# 

Now let’s try that query again:

freshports.stage=# explain analyse select *                                                                          
FROM element_pathname where lower(pathname) = '/ports/head/sysutils/py-mqttwarn';
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on element_pathname  (cost=241.38..9537.09 rows=5268 width=54) (actual time=0.133..0.133 rows=1 loops=1)
   Recheck Cond: (lower(pathname) = '/ports/head/sysutils/py-mqttwarn'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on element_pathname_pathname_lc  (cost=0.00..240.06 rows=5268 width=0) (actual time=0.108..0.108 rows=1 loops=1)
         Index Cond: (lower(pathname) = '/ports/head/sysutils/py-mqttwarn'::text)
 Planning Time: 0.087 ms
 Execution Time: 0.155 ms
(7 rows)

freshports.stage=# 

Now we’re using an index scan on that newly created lower case index.

Let’s do a few more queries to see how they go:

freshports.stage=# explain analyse select *
FROM element_pathname where lower(pathname) = '/ports/head/sysutils/bacula-server';
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on element_pathname  (cost=241.38..9537.09 rows=5268 width=54) (actual time=0.050..0.051 rows=1 loops=1)
   Recheck Cond: (lower(pathname) = '/ports/head/sysutils/bacula-server'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on element_pathname_pathname_lc  (cost=0.00..240.06 rows=5268 width=0) (actual time=0.037..0.037 rows=1 loops=1)
         Index Cond: (lower(pathname) = '/ports/head/sysutils/bacula-server'::text)
 Planning Time: 0.073 ms
 Execution Time: 0.067 ms
(7 rows)
freshports.stage=# explain analyse select *
FROM element_pathname where lower(pathname) = '/ports/head/sysutils/bacula9-server';
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on element_pathname  (cost=241.38..9537.09 rows=5268 width=54) (actual time=0.065..0.066 rows=1 loops=1)
   Recheck Cond: (lower(pathname) = '/ports/head/sysutils/bacula9-server'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on element_pathname_pathname_lc  (cost=0.00..240.06 rows=5268 width=0) (actual time=0.049..0.050 rows=1 loops=1)
         Index Cond: (lower(pathname) = '/ports/head/sysutils/bacula9-server'::text)
 Planning Time: 0.086 ms
 Execution Time: 0.088 ms
(7 rows)

freshports.stage=# 
freshports.stage=# explain analyse select *
FROM element_pathname where lower(pathname) = '/ports/head/lang/gcc10-devel';
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on element_pathname  (cost=241.38..9537.09 rows=5268 width=54) (actual time=0.132..0.132 rows=1 loops=1)
   Recheck Cond: (lower(pathname) = '/ports/head/lang/gcc10-devel'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on element_pathname_pathname_lc  (cost=0.00..240.06 rows=5268 width=0) (actual time=0.116..0.117 rows=1 loops=1)
         Index Cond: (lower(pathname) = '/ports/head/lang/gcc10-devel'::text)
 Planning Time: 0.090 ms
 Execution Time: 0.155 ms
(7 rows)

freshports.stage=# 
freshports.stage=# explain analyse select *
FROM element_pathname where lower(pathname) = '/ports/head/lang/perl5.32';
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on element_pathname  (cost=241.38..9537.09 rows=5268 width=54) (actual time=0.059..0.060 rows=1 loops=1)
   Recheck Cond: (lower(pathname) = '/ports/head/lang/perl5.32'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on element_pathname_pathname_lc  (cost=0.00..240.06 rows=5268 width=0) (actual time=0.044..0.044 rows=1 loops=1)
         Index Cond: (lower(pathname) = '/ports/head/lang/perl5.32'::text)
 Planning Time: 0.085 ms
 Execution Time: 0.081 ms
(7 rows)

Yes, I think this will do nicely.

Things change

There is code and tables in here which date back to late 1999 or early 2000. Since then, the database has grown from 15 tables in Nov 2001 to 80 tables today.

I’m not sure when this svg was created or how many tables it represents, but I know its a few years old now.

Database and code evolve over time and it is sometimes interesteing to see the paths they take.

Jul 302020
 

In my recent post, my conclusion preferred Rename over Delete & Add. Now I’ve changed my mind.

I looked at the code, and the database. Now I prefer Delete & Add.

The database

In the database, there is no way to represent the Rename action without modifying the structure of the commit_log_elements table:

freshports.devgit=# \d commit_log_elements
                                  Table "public.commit_log_elements"
    Column     |     Type     | Collation | Nullable |                     Default                     
---------------+--------------+-----------+----------+-------------------------------------------------
 id            | integer      |           | not null | nextval('commit_log_elements_id_seq'::regclass)
 commit_log_id | integer      |           | not null | 
 element_id    | integer      |           | not null | 
 revision_name | text         |           |          | 
 change_type   | character(1) |           | not null | 

The change_type field is controlled by this constraint:

Check constraints:
    "commit_log_elements_change_type" CHECK (change_type = 'A'::bpchar OR change_type = 'M'::bpchar OR change_type = 'R'::bpchar OR change_type = 'r'::bpchar)

Let’s look at the comments I set up for that constraint:

freshports.devgit=# \dd commit_log_elements_change_type
                                          Object descriptions
 Schema |              Name               |      Object      |               Description                
--------+---------------------------------+------------------+------------------------------------------
 public | commit_log_elements_change_type | table constraint |                                         +
        |                                 |                  | A - add                                 +
        |                                 |                  | M - modify                              +
        |                                 |                  | R - remove for subersion, delete for git+
        |                                 |                  | r - rename (added for git)
(1 row)

Each entry in this table documents an action (in the change_type column) and the file (element_id) upon which that action occurs. All for a given commit (commit_log_id). The revision_name column refers to the subversion revisions of a file in a given commit. For git commits, that field contains the commit hash.

Except for r (Rename). For that, we get only the origin, not the destination, of the rename.

An Add and Remove includes all the data.

A Rename does not. It omits the destination. I don’t see an easy solution for
Rename.

How much data?

We have 51 entries for r:

freshports.devgit=# select count(*) from commit_log_elements where change_type = 'r';
 count 
-------
    51
(1 row)

We can redo all those commits easily enough.

That’s why

That explains is why I’m going with Add and Remove.

But first, let’s run a test.

The test with new XML

This is the new XML I’m going to test with:

[dan@devgit-ingress01:/var/db/ingress/message-queues/testing-new] $ cat 2020.07.27.01.58.33.000078.fd2bb768b5a39d29758b8f74bc15c321d6b4f980.xml
<?xml version='1.0' encoding='UTF-8'?>
<UPDATES Version="1.4.0.0">
  <UPDATE>
    <DATE Year="2020" Month="7" Day="27"/>
    <TIME Timezone="UTC" Hour="1" Minute="58" Second="33"/>
    <OS Repo="ports" Id="FreeBSD" Branch="master"/>
    <LOG>editors/nvi2-port: Rename to editors/nvi2

Grrr... my addport notes failed me.</LOG>
    <PEOPLE>
      <UPDATER Handle="leres &lt;leres@FreeBSD.org&gt;"/>
    </PEOPLE>
    <COMMIT Hash="fd2bb768b5a39d29758b8f74bc15c321d6b4f980" HashShort="fd2bb76" Subject="editors/nvi2-port: Rename to editors/nvi2" EncodingLoses="false" Repository="ports"/>
    <FILES>
      <FILE Action="Modify" Path="editors/Makefile"/>
      <FILE Action="Delete" Path="editors/nvi2-port/Makefile"/>
      <FILE Action="Delete" Path="editors/nvi2-port/distinfo"/>
      <FILE Action="Delete" Path="editors/nvi2-port/pkg-descr"/>
      <FILE Action="Add" Path="editors/nvi2/Makefile"/>
      <FILE Action="Add" Path="editors/nvi2/distinfo"/>
      <FILE Action="Add" Path="editors/nvi2/pkg-descr"/>
    </FILES>
  </UPDATE>
</UPDATES>

I ran it through the processing:

$ sudo mv -i 2020.07.27.01.58.33.000078.fd2bb768b5a39d29758b8f74bc15c321d6b4f980.xml ~ingress/message-queues/incoming/

Looking in the logs I found this:

the size of @Files is 7
FILE ==: Modify, /ports/head/editors/Makefile, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/Makefile, 4106650
FILE ==: Delete, /ports/head/editors/nvi2-port/Makefile, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/nvi2-port/Makefile, 4106651
FILE ==: Delete, /ports/head/editors/nvi2-port/distinfo, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/nvi2-port/distinfo, 4106652
FILE ==: Delete, /ports/head/editors/nvi2-port/pkg-descr, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/nvi2-port/pkg-descr, 4106653
FILE ==: Add, /ports/head/editors/nvi2/Makefile, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/nvi2/Makefile, 4106654
FILE ==: Add, /ports/head/editors/nvi2/distinfo, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/nvi2/distinfo, 4106655
FILE ==: Add, /ports/head/editors/nvi2/pkg-descr, fd2bb768b5a39d29758b8f74bc15c321d6b4f980, , ports, head, editors/nvi2/pkg-descr, 4106656
# # # # Deleting deleted ports # # # #

deleting : port = editors/nvi2-port, port_id = '59458', ' element_id = 1057410'
# # # # Finished deleting deleted ports # # # #

Line 11 is the key. That’s the port being deleted this this little bit of code from verifyport.pm:

#
# $port now contains the port for this file.
# let's adjust the needs_refresh value.
#
#
# if we just deleted the Makefile for this port, there's no sense in refreshing the port.
# because it's been deleted.
#
if ($extra eq $FreshPorts::Constants::FILE_MAKEFILE && ($action eq $FreshPorts::Constants::REMOVE || $action eq $FreshPorts::Constants::DELETE)) {
    #
    # we are deleted (local value, never actually saved to db)
    #
    $port->{deleted} = 1;
    print "THIS PORT HAS BEEN DELETED\n";
}

In the end, the port was properly deleted without any code changes.

Another test

Let’s also try the databases/sqliteodbc-sqlite2 commit from the previous post.

The XML is:

[dan@devgit-ingress01:/var/db/ingress/message-queues/testing-new] $ cat 2020.07.27.18.50.49.000000.c4f1589e8f8c02859c3a8893bd7a0da15b6df8b9.xml
<?xml version='1.0' encoding='UTF-8'?>
<UPDATES Version="1.4.0.0">
  <UPDATE>
    <DATE Year="2020" Month="7" Day="27"/>
    <TIME Timezone="UTC" Hour="18" Minute="50" Second="49"/>
    <OS Repo="ports" Id="FreeBSD" Branch="master"/>
    <LOG>Remove expired port:

2020-07-27 databases/sqliteodbc-sqlite2: SQLite2 has stopped development in 2005, please move to SQLite3</LOG>
    <PEOPLE>
      <UPDATER Handle="rene &lt;rene@FreeBSD.org&gt;"/>
    </PEOPLE>
    <COMMIT Hash="c4f1589e8f8c02859c3a8893bd7a0da15b6df8b9" HashShort="c4f1589" Subject="Remove expired port:" EncodingLoses="false" Repository="ports"/>
    <FILES>
      <FILE Action="Modify" Path="MOVED"/>
      <FILE Action="Modify" Path="databases/Makefile"/>
      <FILE Action="Delete" Path="databases/sqliteodbc-sqlite2/Makefile"/>
    </FILES>
  </UPDATE>
</UPDATES>

The log shows:

That message is all done under Commit ID = '810547'
the size of @Files is 3
FILE ==: Modify, /ports/head/MOVED, c4f1589e8f8c02859c3a8893bd7a0da15b6df8b9, , ports, head, MOVED, 4107056
FILE ==: Modify, /ports/head/databases/Makefile, c4f1589e8f8c02859c3a8893bd7a0da15b6df8b9, , ports, head, databases/Makefile, 4107057
FILE ==: Delete, /ports/head/databases/sqliteodbc-sqlite2/Makefile, c4f1589e8f8c02859c3a8893bd7a0da15b6df8b9, , ports, head, databases/sqliteodbc-sqlite2/Makefile, 4107058
# # # # Deleting deleted ports # # # #

deleting : port = databases/sqliteodbc-sqlite2, port_id = '38622', ' element_id = 723652'
# # # # Finished deleting deleted ports # # # #

Good!

I think we’re good to go here.

Running the old commits

Let’s get the list of old commits to run:

freshports.devgit=# select distinct revision_name from commit_log_elements where change_type = 'r';
              revision_name               
------------------------------------------
 481991bdc9aeb780e44dae0e235b47034335375a
 59a1c626e040d1eb3ef9fe05b67d0d448296f01f
 75f3786fe43cd66ac10614047b0373f4d1b64696
 7b86ce7d8dfa46481bb92ce3dd5cf4dc7464d9f0
 8240d44f53adaf5d66bfcd4b44879cc0894d49a2
 8b92b78775cfd4174e4e765d6023b195fee0bbfa
 a2810a57b46b7e367c0b1aabb18a03f9d5971be6
 d016c9140a6e4b7627d26485926a659f2686185d
 d38f3fa0fc45e73bdb17bbc139d846f784f3d1e8
 d5c503dd5c31a156c7cc5b78afbf98cb2643d716
 dbbc6bd62c9b904b7237df4344f55cdc2bbcce1f
 e792298a3b48d74277e293cc864a113958a39bd1
 e9f48a1168c84f84424d95816af645895e338daf
 f2bfe60090b840b6d99a3288c0b745843cefcfe1
(14 rows)

That’s an easy list to run.

Here’s my script to delete the existing commits:

freshports.devgit=# begin;
BEGIN
freshports.devgit=# 
freshports.devgit=# delete from commit_log where svn_revision in
freshports.devgit-# (
freshports.devgit(#  '481991bdc9aeb780e44dae0e235b47034335375a',
freshports.devgit(#  '59a1c626e040d1eb3ef9fe05b67d0d448296f01f',
freshports.devgit(#  '75f3786fe43cd66ac10614047b0373f4d1b64696',
freshports.devgit(#  '7b86ce7d8dfa46481bb92ce3dd5cf4dc7464d9f0',
freshports.devgit(#  '8240d44f53adaf5d66bfcd4b44879cc0894d49a2',
freshports.devgit(#  '8b92b78775cfd4174e4e765d6023b195fee0bbfa',
freshports.devgit(#  'a2810a57b46b7e367c0b1aabb18a03f9d5971be6',
freshports.devgit(#  'd016c9140a6e4b7627d26485926a659f2686185d',
freshports.devgit(#  'd38f3fa0fc45e73bdb17bbc139d846f784f3d1e8',
freshports.devgit(#  'd5c503dd5c31a156c7cc5b78afbf98cb2643d716',
freshports.devgit(#  'dbbc6bd62c9b904b7237df4344f55cdc2bbcce1f',
freshports.devgit(#  'e792298a3b48d74277e293cc864a113958a39bd1',
freshports.devgit(#  'e9f48a1168c84f84424d95816af645895e338daf',
freshports.devgit(#  'f2bfe60090b840b6d99a3288c0b745843cefcfe1');
DELETE 14
freshports.devgit=# commit;
COMMIT
freshports.devgit=# 
#!/bin/sh

commits="481991bdc9aeb780e44dae0e235b47034335375a
 59a1c626e040d1eb3ef9fe05b67d0d448296f01f
 75f3786fe43cd66ac10614047b0373f4d1b64696
 7b86ce7d8dfa46481bb92ce3dd5cf4dc7464d9f0
 8240d44f53adaf5d66bfcd4b44879cc0894d49a2
 8b92b78775cfd4174e4e765d6023b195fee0bbfa
 a2810a57b46b7e367c0b1aabb18a03f9d5971be6
 d016c9140a6e4b7627d26485926a659f2686185d
 d38f3fa0fc45e73bdb17bbc139d846f784f3d1e8
 d5c503dd5c31a156c7cc5b78afbf98cb2643d716
 dbbc6bd62c9b904b7237df4344f55cdc2bbcce1f
 e792298a3b48d74277e293cc864a113958a39bd1
 e9f48a1168c84f84424d95816af645895e338daf
 f2bfe60090b840b6d99a3288c0b745843cefcfe1"
 
 for commit in $commits
 do
   echo /usr/local/libexec/freshports/test-new-xml-code-single.sh ports $commit | sudo su -fm ingress
 done

That will create the new and old versions of the XML in two different directories. I take the new stuff and toss it into the incoming queue.

[dan@devgit-ingress01:/var/db/ingress/message-queues/testing-new] $ ls -l
total 195
-rw-rw-r--  1 ingress  ingress    887 Jul 30 22:03 2019.09.18.17.37.59.000000.f2bfe60090b840b6d99a3288c0b745843cefcfe1.xml
-rw-rw-r--  1 ingress  ingress   1291 Jul 30 22:03 2020.07.11.18.08.50.000000.e792298a3b48d74277e293cc864a113958a39bd1.xml
-rw-rw-r--  1 ingress  ingress   1246 Jul 30 22:02 2020.07.11.20.45.27.000000.59a1c626e040d1eb3ef9fe05b67d0d448296f01f.xml
-rw-rw-r--  1 ingress  ingress   1316 Jul 30 22:02 2020.07.15.15.47.41.000000.7b86ce7d8dfa46481bb92ce3dd5cf4dc7464d9f0.xml
-rw-rw-r--  1 ingress  ingress   2492 Jul 30 22:03 2020.07.17.08.21.24.000000.d38f3fa0fc45e73bdb17bbc139d846f784f3d1e8.xml
-rw-rw-r--  1 ingress  ingress   2492 Jul 30 22:02 2020.07.17.08.21.54.000000.481991bdc9aeb780e44dae0e235b47034335375a.xml
-rw-rw-r--  1 ingress  ingress   7161 Jul 30 22:03 2020.07.21.17.15.38.000000.d016c9140a6e4b7627d26485926a659f2686185d.xml
-rw-rw-r--  1 ingress  ingress   1316 Jul 30 22:02 2020.07.22.10.41.53.000000.8b92b78775cfd4174e4e765d6023b195fee0bbfa.xml
-rw-rw-r--  1 ingress  ingress    982 Jul 30 22:03 2020.07.22.14.00.59.000000.e9f48a1168c84f84424d95816af645895e338daf.xml
-rw-rw-r--  1 ingress  ingress  82325 Jul 30 22:02 2020.07.23.16.31.18.000000.75f3786fe43cd66ac10614047b0373f4d1b64696.xml
-rw-rw-r--  1 ingress  ingress    908 Jul 30 22:03 2020.07.26.08.01.07.000000.d5c503dd5c31a156c7cc5b78afbf98cb2643d716.xml
-rw-rw-r--  1 ingress  ingress   1168 Jul 30 22:03 2020.07.28.18.52.11.000000.a2810a57b46b7e367c0b1aabb18a03f9d5971be6.xml
-rw-rw-r--  1 ingress  ingress   1490 Jul 30 22:03 2020.07.30.12.20.24.000000.dbbc6bd62c9b904b7237df4344f55cdc2bbcce1f.xml
-rw-rw-r--  1 ingress  ingress  37100 Jul 30 22:02 2020.07.30.20.39.39.000000.8240d44f53adaf5d66bfcd4b44879cc0894d49a2.xml
[dan@devgit-ingress01:/var/db/ingress/message-queues/testing-new] $ sudo mv * ~ingress/message-queues/incoming/
[dan@devgit-ingress01:/var/db/ingress/message-queues/testing-new] $ 

After waiting a bit, I ran this query again:

freshports.devgit=# select distinct revision_name from commit_log_elements where change_type = 'r';
              revision_name               
------------------------------------------
 f2bfe60090b840b6d99a3288c0b745843cefcfe1
(1 row)

freshports.devgit=# 

So why didn’t that one run. Let’s see the XML:

<?xml version='1.0' encoding='UTF-8'?>
<UPDATES Version="1.4.0.0">
  <UPDATE>
    <DATE Year="2019" Month="9" Day="18"/>
    <TIME Timezone="UTC" Hour="17" Minute="37" Second="59"/>
    <OS Repo="ports" Id="FreeBSD" Branch="master"/>
    <LOG>net-mgmt/unifi5: Update to 5.11.46

Also pull in port improvements from glewis@ to enable customizing which Java
that Unifi runs with

PR:             240016</LOG>
    <PEOPLE>
      <UPDATER Handle="feld &lt;feld@FreeBSD.org&gt;"/>
    </PEOPLE>
    <COMMIT Hash="f2bfe60090b840b6d99a3288c0b745843cefcfe1" HashShort="f2bfe60" Subject="net-mgmt/unifi5: Update to 5.11.46" EncodingLoses="false" Repository="ports"/>
    <FILES>
      <FILE Action="Modify" Path="net-mgmt/unifi5/Makefile"/>
      <FILE Action="Modify" Path="net-mgmt/unifi5/distinfo"/>
      <FILE Action="Modify" Path="net-mgmt/unifi5/files/unifi.in"/>
    </FILES>
  </UPDATE>
</UPDATES>

No renames. What’s up with that? Only modifies. Let’s check the database.

I found a lot of them…

freshports.devgit=# select count(*) from commit_log_elements where revision_name = 'f2bfe60090b840b6d99a3288c0b745843cefcfe1';
 count 
-------
  3189
(1 row)

Over a lot of commits:

freshports.devgit=# select count(distinct commit_log_id) from commit_log_elements where revision_name = 'f2bfe60090b840b6d99a3288c0b745843cefcfe1';
 count 
-------
   813
(1 row)

This brings to mind a hardcode commit_id I recall in the code….

Searching my blog, I found subversion vs git – in the database.

Searching the code, I found:

$ grep -r f2bfe60090b840b6d99a3288c0b745843cefcfe1 ~/scripts/ ~/modules/
modules/xml_munge_git.pm:#	$Updates{commit_hash}  = 'f2bfe60090b840b6d99a3288c0b745843cefcfe1';

So I did have it hardcoded for a while. That explains it. I can ignore that commit.

I think this merge can be accepted with some slight changes.

Jul 292020
 

I spent an evening working through the delete issue with git repos.

A Twitter thread was the result.

Sergey Kozlov came up with some code changes which changed the XML output. Now I think we have a plan.

The commit I’m using for a working example was to editors/nvi2 on 27 Jul 2020 01:58:34 – the details of the git and svn commits are outlined below.

The svn commit

  1. The email
  2. The files part of the XML:
    <FILE Action="Add" Revision="543502" Path="head/editors/nvi2/"></FILE>
    <FILE Path="head/editors/nvi2-port/" Action="Remove" Revision="543502"></FILE>
    <FILE Path="head/editors/Makefile" Action="Modify" Revision="543502"></FILE>
    

The git commit

  1. There is no email
  2. The files part of the git XML:
    <FILE Action="Modify" Path="editors/Makefile"/>
    <FILE Action="Rename" Path="editors/nvi2-port/Makefile" Destination="editors/nvi2/Makefile"/>
    <FILE Action="Rename" Path="editors/nvi2-port/distinfo" Destination="editors/nvi2/distinfo"/>
    <FILE Action="Rename" Path="editors/nvi2-port/pkg-descr" Destination="editors/nvi2/pkg-descr"/>
    

The crucial difference

The svn XML shows head/editors/nvi2-port being Removed. As demonstrated via Twitter, this Removed action causes this SQL:

insert into commit_log_elements(id, commit_log_id, element_id, revision_name, change_type) values (4113357, 809907, 1063362, '543502', 'R')

That element_id value represents:

freshports.dev=# select element_pathname(1063362);
       element_pathname        
-------------------------------
/ports/head/editors/nvi2-port

That ‘R‘ on the insert to the commit_log_element table invoked this trigger action:

   if (NEW.change_type = 'R') then
      UPDATE element
         SET status = 'D'
       WHERE id = NEW.element_id;
   end if;

That marks the file (actually, a directory /ports/head/editors/nvi2-port) in the database. That’s the element table. What about the ports table? How does that happen.

How?

The above trigger deletes the element. The element table implements a directory and file using a recursive structure. In part, it looks like this:

freshports.dev=# \d element
                                     Table "public.element"
       Column        |     Type     | Collation | Nullable |               Default               
---------------------+--------------+-----------+----------+-------------------------------------
 id                  | integer      |           | not null | nextval('element_id_seq'::regclass)
 name                | text         |           | not null | 
 parent_id           | integer      |           |          | 
 directory_file_flag | character(1) |           | not null | 
 status              | character(1) |           | not null | 

As the commit is processed, the element (/ports/head/editors/nvi2-port) is deleted because of line 1 of the svn XML (head/editors/nvi2). In general terms, a port is an abstraction of the file system. As that part of the file system is deleted, the port must also be deleted, but that’s a separate table.

So how does the port get deleted? I’m now writing this in hindsight. I didn’t know that when I was posting to Twitter, but tonight I re-discovered it.

It is a trigger on the element table:

element_ports_status AFTER UPDATE ON element FOR EACH ROW EXECUTE FUNCTION element_ports_status()

It’s been many years since I wrote this. Checking the repo, I find it was committed on Mon Sep 11 02:27:26 2006 – I recall this was how it was done, but I’d completely forgotten about it.

-- keep ports.status in sync with element.status
-- this is a performance boosting exercise

CREATE OR REPLACE FUNCTION element_ports_status() RETURNS TRIGGER AS $$
   BEGIN
      IF (NEW.status <> OLD.status) then
         UPDATE ports
            SET status = NEW.status
          WHERE ports.element_id = NEW.id;
      END IF;
      RETURN NEW;
   END
$$ LANGUAGE 'plpgsql';

There it is. So long as I delete the directory which pertains to this port, the system will automagically, via triggers, delete the port.

What needs to change

With the git XML, we don’t get to see the directory being deleted (or removed, as the case may be).

I think we will have to make due with the port‘s Makefile being Remove‘d (in the svn XML) (or Rename‘d as shown in the git XML).

Rename or remove?

We have two options with the git library:

  1. <FILE Action="Rename" Path="editors/nvi2-port/Makefile" Destination="editors/nvi2/Makefile"/>
    <FILE Action="Rename" Path="editors/nvi2-port/distinfo" Destination="editors/nvi2/distinfo"/>
    <FILE Action="Rename" Path="editors/nvi2-port/pkg-descr" Destination="editors/nvi2/pkg-descr"/>/li>
    
  2. <FILE Action="Delete" Path="editors/nvi2-port/Makefile"/>
    <FILE Action="Delete" Path="editors/nvi2-port/distinfo"/>
    <FILE Action="Delete" Path="editors/nvi2-port/pkg-descr"/>
    <FILE Action="Add" Path="editors/nvi2/Makefile"/>
    <FILE Action="Add" Path="editors/nvi2/distinfo"/>
    <FILE Action="Add" Path="editors/nvi2/pkg-descr"/>
    

Both sets of XML amount to the same results. I am not yet sure which one I like better.

Logistically, the first option shows what is happening: you can see where A is becoming B.

In option 2, that is not so clear.

I think option 1 is the way I’m going to go. This blog post serves to document the above and to help me when the time comes to alter the code.

Things to code

The code has to cater for the Makefile Rename and for the Makefile Delete. The former happens here in this example. The latter would happen when a port is deleted.

Let’s look at databases/sqliteodbc-sqlite2 which was removed on 27 Jul 2020 18:50:49 with this commit:

  • the email
  • The XML:
                <FILE Action="Remove" Revision="543544" Path="head/databases/sqliteodbc-sqlite2/"></FILE>
                <FILE Path="head/MOVED" Action="Modify" Revision="543544"></FILE>
                <FILE Path="head/databases/Makefile" Revision="543544" Action="Modify"></FILE>
    

Here, we have the directory head/databases/sqliteodbc-sqlite2 being deleted. This SVN works well.

Let’s look at this in terms of the git XML:

<FILE Action="Modify" Path="MOVED"/>
<FILE Action="Modify" Path="databases/Makefile"/>
<FILE Action="Delete" Path="databases/sqliteodbc-sqlite2/Makefile"/>

It seems the Delete under git is the same as Remove under svn.

The git code needs to be modified to recognize that a Makefile is being deleted and when that happens, also delete the element and the port.

Jul 092020
 

I want to document some of the not-visible differences between a git commit and a subversion commit when in the database.

subversion git
id 808224 808303
message_id 202007091812.069IC6QK044004@repo.freebsd.org a1d2d5b40fba4186daf29b2b2e9ff2dd8b7d8039
message_date 2020-07-09 18:12:06+00 2020-07-09 18:12:06+00
message_subject svn commit: r541803 – in head/devel: . py-absl-py [null]
date_added 2020-07-09 18:39:53.510006+00 2020-07-09 18:59:49.863092+00
commit_date 2020-07-09 18:12:06+00 2020-07-09 18:12:06+00
committer sunpoet sunpoet
description Add py-absl-py 0.9.0

Abseil Python Common Libraries is a collection of Python library code for
building Python applications. The code is collected from Google’s own Python
code base, and has been extensively tested and used in production.

Features:
– Simple application startup
– Distributed commandline flags system
– Custom logging module with additional features
– Testing utilities

WWW: https://github.com/abseil/abseil-py

Add py-absl-py 0.9.0

Abseil Python Common Libraries is a collection of Python library code for
building Python applications. The code is collected from Google’s own Python
code base, and has been extensively tested and used in production.

Features:
– Simple application startup
– Distributed commandline flags system
– Custom logging module with additional features
– Testing utilities

WWW: https://github.com/abseil/abseil-py

system_id 1 1
encoding_losses f f
svn_revision 541803 f2bfe60090b840b6d99a3288c0b745843cefcfe1
repo_id 1 [null]

Of note:

  1. message_id – In subversion, I’ve used the email message_id of the incoming commit. In git, it is the commit hash
  2. message_subject – The message subject is taken from the subversion email commit message. There is no such field for git because we are extracting directly from git.
  3. svn_revision – These are the subversion revision number and the git commit hash, respectively.
  4. repo_id – This is null for git, and I am not sure why. This may need to be fixed.