The new process_vuxml.pl

This post is a work-in-progress. This paragraph will be removed when it is completed.

This post is mostly for my own benefit. It will track the redesign of process_vuxml.pl which follows on from How FreshPorts processes vuxml entries. In that previous post, defects were identified and a proposed change put forth. In this post, I’ll document how I intend to proceed.

Temp table

I’m going to create a temp table, consisting of:

  • vid
  • hash (I will also refer to that as a checksum)

The data from security/vuxml/vuln.xml will be imported into that table.

freshports.dvl=# \d vuxml_import
           Table "public.vuxml_import"
  Column  | Type | Collation | Nullable | Default 
----------+------+-----------+----------+---------
 vid      | text |           | not null | 
 checksum | text |           | not null | 
Indexes:
    "vuxml_import_vid" PRIMARY KEY, btree (vid)

freshports.dvl=# 

It took me most of an hour to get to this point. First, I had to grant access to from this laptop to the database server via port 5432 (pf firewall rules). Second, I had to modify the PostgreSQL pg_hba.conf rules to all this laptop to connect. Both of those items involved Ansible configuration changes.

2025-08-02

I tried to use the PostgreSQL copy command from within Perl, but kept getting

Aug  3 01:35:29 pg01 postgres[98711]: [11-1] 2025-08-03 01:35:29.923 UTC [98711] LOG:  statement: COPY vuxml_import FROM '/tmp/W_PTHFuxT7' WITH (FORMAT TEXT, DELIMITE
R '     ', HEADER false)
Aug  3 01:35:29 pg01 postgres[98711]: [12-1] 2025-08-03 01:35:29.924 UTC [98711] ERROR:  could not open file "/tmp/W_PTHFuxT7" for reading: No such file or directory
Aug  3 01:35:29 pg01 postgres[98711]: [12-2] 2025-08-03 01:35:29.924 UTC [98711] HINT:  COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
Aug  3 01:35:29 pg01 postgres[98711]: [12-3] 2025-08-03 01:35:29.924 UTC [98711] STATEMENT:  COPY vuxml_import FROM '/tmp/W_PTHFuxT7' WITH (FORMAT TEXT, DELIMITER '          ', HEADER false)

I tried psql, and it just worked:

freshports.dvl=> \copy vuxml_import from '/tmp/W_PTHFuxT7' WITH (FORMAT TEXT, DELIMITER '       ', HEADER false)
;
COPY 6020
freshports.dvl=> select * from vuxml_import;
                 vid                  |                             checksum                             
--------------------------------------+------------------------------------------------------------------
 b945ce3f-6f9b-11f0-bd96-b42e991fc52e | e3772ac353d275f309f2a10e7688d48e894f771e83b29ad041e05e6b298b1160
 95480188-6ebc-11f0-8a78-bf201f293bce | 67af8f935b1f35da8a8393df00ebfdf2d90871526f7adbe3a815916b99c73d65
 f51077bd-6dd7-11f0-9d62-b42e991fc52e | 28fa973c0debcbc915d9b901e6f8a7e234ab026a23dc0e4e188e82c08db7a83c
 cd7f969e-6cb4-11f0-97c4-40b034429ecf | 226b7557d10dde55b5e75b49339bcb41d27d6a1c19d3941d9e9a204951c19a45
 c37f29ba-6ae3-11f0-b4bf-ecf4bbefc954 | f6933743541a61306a034b57fedc8d8537b7029fd694bcc0d91beb621bb60aa0
...

Let’s see if I can get the SQL going next time I tackle this.

Trying a copy from perl

I’m already doing a copy from within python to load data into the database. I’m going to do the same thing from within perl.

Here’s the code I grabbed from examples I found:

    # UNLINK = false is for debugging purposes
    my $fh_vid    = File::Temp->new(UNLINK => 0, PERMS => 0644);
    my $fname_vid = $fh_vid->filename;

    # stuff here where I write to $fh_vid...

    $fh_vid->close();

    my $file_path    = "$fname_vid";
    my $table_name   = "vuxml_import";
    my $truncate_sql = "TRUNCATE vuxml_import";
    my $copy_sql     = "COPY $table_name FROM '$file_path' WITH (FORMAT TEXT, HEADER false)";

    print "\$copy_sql='$copy_sql\n";
    eval {
        $dbh->do($truncate_sql);
        $dbh->do($copy_sql);
        $dbh->do($);
        };
    if ($@) {
        warn "Error during COPY: $@";
        $dbh->rollback(); # Rollback on error
    }
    $dbh->commit(); # Commit the transaction if successful

I’m typing this after-the-fact, so there may be errors in the above, but that’s the general approach.

What is important is the error I encountered (and posted):

DBD::Pg::db do failed: ERROR:  could not open file "/tmp/vnhvGB2yN0" for reading: No such file or directory
HINT:  COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility 
such as psql's \copy. at ./process_vuxml_new.pl line 134.

What? I’m sure it’s there, see:

[11:46 dvl-ingress01 dvl ~/scripts] % ls -l /tmp/vnhvGB2yN0
-rw-r--r--  1 freshports wheel 614040 2025.08.03 11:46 /tmp/vnhvGB2yN0

As I typed that post, I realized. It’s a server side copy. The code expects to file in question to be on the PostgreSQL database server. Yes, it says so right there in the docs:

“Files named in a COPY command are read or written directly by the server, not by the client application.

I think I’ve been spoilt by my past use of the copy_from function in https://github.com/psycopg/psycopg2 – that’s rather distinct from using COPY direct from within an SQL command.

Now I have to decide:

  1. Do I switch this script to Python?
  2. Or do I invoke psql from within perl?

I think I’ll do the latter.

The latter

I try this from within psql:

freshports.dvl=> \copy vuxml_import from '/tmp/vnhvGB2yN0' WITH (FORMAT TEXT, HEADER false);
COPY 6020

It works.

Next step, tell the freshports user to do the same thing. Let’s echo that in:

I try this from the command line: silently does nothing. No errors. No clues.

[12:15 dvl-ingress01 dvl ~/scripts] % echo "\copy vuxml_import from '/tmp/vnhvGB2yN0' WITH (FORMAT TEXT, 
HEADER false);" | psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl" 

Back to basics: simple testing.

[12:17 dvl-ingress01 dvl ~/scripts] % echo "copy"
copy
[12:17 dvl-ingress01 dvl ~/scripts] % echo "\copy"
[12:17 dvl-ingress01 dvl ~/scripts] % echo "\\copy"
[12:17 dvl-ingress01 dvl ~/scripts] % echo "\\\copy"
\copy

Oh, I need multiple escapes there.

This works:

[12:17 dvl-ingress01 dvl ~/scripts] % echo "\\\copy vuxml_import from '/tmp/vnhvGB2yN0' WITH (FORMAT TEXT, 
HEADER false);" | psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl"

COPY 6020
[12:18 dvl-ingress01 dvl ~/scripts] % 

And over here, we have the data:

freshports.dvl=# select count(*) from vuxml_import;
 count 
-------
  6020
(1 row)

freshports.dvl=#

Let’s do that from within perl

Here’s what I tried:

        my $truncate_sql = "TRUNCATE vuxml_import";

        # our goal is this:
        # echo "\\\copy vuxml_import from '/tmp/vnhvGB2yN0' WITH (FORMAT TEXT, HEADER false);" | psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl"

        # sslcertmode=disable avoids could not open certificate file “/root/.postgresql/postgresql.crt”: Permission denied
        my $psql_command = "$PSQL \"sslmode=$FreshPorts::Config::ssl_mode host=$FreshPorts::Config::host user=$FreshPorts::Config::user dbname=$FreshPorts::Config::dbname sslcertmode=disable\"";
        my $copy_command = "\\copy vuxml_import from '$file_path' WITH (FORMAT TEXT, HEADER false);";


        print "\$copy_command='$copy_command\n";
        print "\$psql_command='$psql_command\n";
        eval {
            $dbh->do($truncate_sql);
            print "committing truncate\n";
            $dbh->commit();


            print "echo \"$copy_command\" | $psql_command";
            print "\n";

            system("echo \"$copy_command\" | $psql_command");
            print "copy command has finished\n";
        };
        if ($@) {
            warn "Error during COPY: $@";
            $dbh->rollback(); # Rollback on error
        }

Let’s run that, and remember, most things run as the freshports user.

[18:28 dvl-ingress01 dvl ~/scripts] % echo ./process_vuxml_new.pl --filename=/jails/freshports/usr/ports/security/vuxml/vuln.xml | sudo su -fm freshports
process_vuxml.pl starts
(there is usually a delay before further output)
There, the parsefile has completed
calculating the sha256 for each vuln
all sha256 saved for each vuln
finished with eval()
$copy_command='\copy vuxml_import from '/tmp/3YT08khXh4' WITH (FORMAT TEXT, HEADER false);
$psql_command='/usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
committing truncate
echo "\copy vuxml_import from '/tmp/3YT08khXh4' WITH (FORMAT TEXT, HEADER false);" | /usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
zsh: done       echo ./process_vuxml_new.pl  | 
zsh: hangup     sudo su -fm freshports

This took me a while to figure out. I think the double pipe got to it. So I tried running it as the freshports user instead of piping it to the user.

Let’s run it directly.

[18:29 dvl-ingress01 dvl ~/scripts] % sudo su -l freshports                                                                                              
$ cd /usr/local/libexec/freshports
$ ./process_vuxml_new.pl --filename=/jails/freshports/usr/ports/security/vuxml/vuln.xml
process_vuxml.pl starts
(there is usually a delay before further output)
There, the parsefile has completed
calculating the sha256 for each vuln
all sha256 saved for each vuln
finished with eval()
$copy_command='\copy vuxml_import from '/tmp/mn8wKvOJB1' WITH (FORMAT TEXT, HEADER false);
$psql_command='/usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
committing truncate
echo "\copy vuxml_import from '/tmp/mn8wKvOJB1' WITH (FORMAT TEXT, HEADER false);" | /usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
COPY 6020
copy command has finished
Total time: 21 seconds
Number of updates: 0
process_vuxml.pl finishes
$ 

Good. Still making progress.

So, while I can’t launch it as myself, it does run as the required user.

The SQL

Now that I’ve uploaded a copy of the current vulns (into the vuxml_import table), now I can determine:

  • Which vuln are new (present in vuxml_import table but not in vuxml table)
  • Which vuln have changed (checksums differs between the two tables)
  • Which vuln have been deleted (found in vuxml table but not in the vuxml_import table)

This should be very straight forward.

New entries

This will pull back all the vuxml_import rows which do not exist in vuxml.

-- find new vids

SELECT vi.vid, vi.checksum
  FROM vuxml_import vi
LEFT OUTER JOIN vuxml
ON vuxml.vid = vi.vid
WHERE vuxml.vid IS NULL;

Hmm, that brought back way more than expected:

freshports.dvl=# SELECT vi.vid, vi.checksum
  FROM vuxml_import vi
LEFT OUTER JOIN vuxml
ON vuxml.vid = vi.vid
WHERE vuxml.vid IS NULL;
                 vid                  |                             checksum                             
--------------------------------------+------------------------------------------------------------------
 44101b31-8ffd-11e7-b5af-a4badb2f4699 | 741a95e0d1f9bfac6d4514b6758bded53fd8f7d448de535a1bbb1d392f8b3e1b
 c7e8e955-6c61-11e7-9b01-2047478f2f70 | e1b83ea4453814902fa8e4a56b374ecbd5249c2bf85d5b69f6cabcab17e336e8
 04bc4e23-9a70-42cb-9fec-3613632d34bc | c904c24aa6c4adc66bf91e763f94d087089eb8e0e0a8d9ea96bcaa7be5225cf2
 2aedd15f-ca8b-11e6-a9a5-b499baebfeaf | 2cc5e7ee3a244fa7cdea684486747ad2d919b64b200c60f9487a076cdfcdb241
 cf0b5668-4d1b-11e6-b2ec-b499baebfeaf | 5a688bbacca0b23bf520e30557a56c7131a5567332bc4c47ba1202c4ac756284
 61b8c359-4aab-11e6-a7bd-14dae9d210b8 | 09b36bf7b1d2c2061a7d996bd4a7ef2c08436938cfa8bb4e36c3d5b899bd99c2
 c17fe91d-4aa6-11e6-a7bd-14dae9d210b8 | d976f5a4e002c4f7b9d856c784b839859f8c47b969810067b6e85a2875ef28e4
 42ecf370-4aa4-11e6-a7bd-14dae9d210b8 | bfa6339440f3b1abc283670efe5e57092dea75c8eb11b000bd92c4fee834d254
 95564990-1138-11e6-b55e-b499baebfeaf | a796d57493c2417c53f2deeca77ab9f8ad2c16c7443e5b256eb9508724925504
 f99a4686-e694-11e3-9032-000c2980a9f3 | f8a3fded846a14c6fc31e19ae8d1e494e1527e0780af35c61da6936539721795
 080c5370-886a-11e3-9533-60a44c524f57 | 9defe99d9c46fddacf27012fcb40602ecae43d7b2b6e0e1841d50b77ced3a81c
 cd082cc6-1548-4b8d-a3aa-a007be611a29 | ee2c57c980e00ce6008062f8a7570693c90795ccb921c3c11f80f5222bed98d9
 6887828f-0229-11e0-b84d-00262d5ed8ee | 326745864d9f432d5ea112f7d221af0e8e1504b0be1661d22e4d41ba24e91393
 b2a6fc0e-070f-11e0-a6e9-00215c6a37bb | cd343481576433a6c93b9c14c89a8f2e81639b313191fbbf31af53a6cbf857a5
 f6b6beaa-4e0e-11df-83fb-0015587e2cc1 | 4852600a147b5cf07a31687ff84cb763f57369ef9d7692eef7b2473340b27ec0
 ff284bf0-3f32-11dc-a79a-0016179b2dd5 | 3cf223f0873590bab26e0d1956f7248728133940d67bd2f376749987911d38a2
 28ce7102-4039-11db-a838-00148584c7dd | 59c37c8659e39107cb7924da42075d9e6a85b46276787281eb779a5fb20c721d
 0a4cd819-0291-11db-bbf7-000c6ec775d9 | 0d62cb10756022a599fbe910f5f68e5eebd8e2a58c629a17c8bf0e7c7c48279f
 af2a60ed-da3e-11da-93e0-00123ffe8333 | 31eb8272e0fdcedfc3312c555940840cdb58388d2209332168ba89f6bec6df8f
 e8c6ade2-6bcc-11d9-8e6f-000a95bc6fae | 96c1c306b528dc30613899ee26e1e84345831b91e707e2e1594b5317cd152a03
 14e8f315-600e-11d9-a9e7-0001020eed82 | 909f2c18bf385bb050158f5beb49bfde1f61a1a41b1e8a01c4885158ddca6c68
 1f8dea68-3436-11d9-952f-000c6e8f12ef | d79b521048462deb7839d614bd75f6674a85f2be6de838495b1af8aa67b3447e
 bef4515b-eaa9-11d8-9440-000347a4fa7d | 6cc0d6b6d1920c0612dee6a1cab21bad3ea8bdb5838861539ae56ef632999c47
 4764cfd6-d630-11d8-b479-02e0185c0b53 | 9ee4350334b0f20cc499c6fbbe59d05c746bf115bb4b185afe7e4541122c8935
 c7705712-92e6-11d8-8b29-0020ed76ef5a | 9fbb9dc9b2f05b5920d49c9aeb1eb900c12b9ab66f2fc7c0850dd3512a31b196
 6fd9a1e9-efd3-11d8-9837-000c41e2cdad | a466a00766924a3c3e4bcbb9a7565c4563a10576bfe22f895aab25f1332d5170
 3362f2c1-8344-11d8-a41f-0020ed76ef5a | 0da7c9fd91e88127d39ffe66794bf98a4c220d6cde5cf156f50b1529c9e4da6d
(27 rows)

freshports.dvl=# 

Ahh, some are cancelled, and FreshPorts is not storing them. Perhaps we should.

For example, this is from /usr/ports/security/vuxml/vuln/2017.xml

  <vuln vid="44101b31-8ffd-11e7-b5af-a4badb2f4699">
    <cancelled/>
  </vuln>

The vuxml table does have a status field. All ‘A’:

freshports.dvl=# select distinct status from vuxml;
 status 
--------
 A
(1 row)

Perhaps I need to start using that field and store cancelled vid.

Detecting cancelled vid

I was trying to detect and ignore cancelled vid at the input side. Despite what I said above, I don’t want to start storing them now. I tried:

my $cancelled = $node->getAttributeNode('cancelled');

No, that’s not right. I posted and got some answers, and found a solution:

my $cancelled = $node->getElementsByTagName('cancelled');
if ($cancelled->getLength() > 0) {
    print "cancelled is $cancelled\n";
    print "skipping $cancelled $vid\n";
    next;
}

This will skip over the cancelled nodes and they will never make it into the vuxml_import table. That query for new nodes now brings back no results. For my purposes, that is correct.

Changed entries

By changed entries, we mean the checksum differs. Let’s start with this:

-- modified vids
SELECT vi.vid, vi.checksum
  FROM vuxml_import vi
LEFT OUTER JOIN vuxml
ON vuxml.vid = vi.vid
WHERE vuxml.checksum <> vi.checksum;

Deleted entries

This is similar to finding new vids.

-- find deleted vids
SELECT vi.vid, vi.checksum
  FROM vuxml_import vi
LEFT OUTER JOIN vuxml
ON vuxml.vid = vi.vid
WHERE vuxml.vid IS NULL;

Let’s make a plan

Here’s the approach I think I’ll take:

  1. open and parse the vuln.xml file
  2. traverse the XML, ignoring cancelled vulns, saving vid and checksum to a file
  3. truncate vuxml_import (delete all rows) – coded
  4. load file contents into vuxml_import – coded
  5. remove deleted vids from vuxml – simple database query
  6. insert new vids – get list from database, run through vuln.xml
  7. update changed vids – same

I’ve already inspected

vuxml.pm

and it handles insert and update well. I don’t have to code that; just call the save() function.

What I’m not sure about: I’m sure that “run through vuln.xml” means a traversal of all nodes.

Traversal of all nodes

Below, I have cobbled together some of the existing code. I have left out a bunch of if statements and error handling, just to get an idea of what is done.

This sample code needs to be amended to adjust for the above mentioned plan. Right now, it saves everything as it goes along. What I’ve done is pulled various lines from process_vuxml.pl just to get a starting point.

    $dbh = FreshPorts::Database::GetDBHandle();

    # declare the parser and parse vuln.xml
    my $parser = new XML::DOM::Parser;
    my $doc = $parser->parsefile ($filename);
    eval {
        # for each node in the doc...
        for my $node ($doc->findnodes('/vuxml/vuln')) {
            # get the vuln id / vid
            my $vid  = $node->getAttributeNode('vid')->getValue();

            # calculate the checksum, which will be stored in the database
            my $csum = sha256_hex(Encode::encode_utf8($node->toString));

            # convert the node to a string.
            my $nodeString = $node->toString();

            # get a file handle to that string and open it
            my $fh = IO::String->new();
            $fh->open($nodeString);

            # save the vuln to the database
            my $vuxml = FreshPorts::vuxml->new( $dbh );

            my $p = FreshPorts::vuxml_parsing->new(Stream        => $fh,
                                                   DBHandle      => $dbh,
                                                   UpdateInPlace => 1);

            # always pass in the checksum from our calculation
            # this will parse the xml and save the data to the database, all in one go.
            $p->parse_xml($csum);
            if ($p->database_updated()) {
                print "yes, the database was updated for $vid\n";
            } else {
                 print "no, the database was NOT updated for $vid\n";
                 next;
            }

            $fh->close;
        } # for my $node
    }; # eval

    # added after seeing it at https://metacpan.org/dist/XML-DOM/view/lib/XML/DOM/Parser.pod
    $doc->dispose;

    if ($@) {
        print "We've got a problem.";
        print "$0: $@\n";
        FreshPorts::CommitterOptIn::RecordErrorDetails("error processing vuxml", $0);
        die "$0: $@\n";
    }

    print "committing\n";
    $dbh->commit();

Next: what actually invokes vuxml->save()?

That always perplexes me, everything I look at that code.

It’s here, within update_database_vuxml

    $vuxml_id = $vuxml->save();

That function is called by update_database().

In turn, it is called by handle_end().

Ahh, I always forget how XML::Parser works.

Delete the missing vuxml entries

This will delete all entries from vuxml not found in vuxml_import:

WITH deleted_vids AS (
SELECT V.vid, V.checksum
  FROM vuxml V
LEFT OUTER JOIN vuxml_import vi
ON V.vid = vi.vid
WHERE vi.vid IS NULL
)
DELETE from vuxml V
USING deleted_vids as DV
WHERE V.vid = DV.vid;

2025-08-09 – This seems to work

I seem to have a solution.

[20:21 dvl-ingress01 dvl ~/scripts] % cat process_vuxml_new.pl
#!/usr/local/bin/perl -w
#
# $Id: process_vuxml.pl,v 1.6 2013-01-16 15:37:57 dan Exp $
#
# Copyright (c) 2001-2012 DVL Software
#
# much of this file is based on contributions from Matthew Seamon
#

# @{#} $Id: process_vuxml.pl,v 1.6 2013-01-16 15:37:57 dan Exp $
#
# Split up the vuln.xml file into sections for individual
# vulnerabilities.  Save into files using the vid guid field as name.
# Calculate SHA256 checksum for the XML snippet and write out to an
# index file.

#use 5.10.1;
use strict;
use warnings;
use XML::DOM::XPath;
use Encode;
use Digest::SHA qw(sha256_hex);
use autodie qw(:default);
use IO::String;
use Getopt::Long;
use File::Temp ();

# for debugging
use Data::Dumper;

use FreshPorts::database;
use FreshPorts::vuxml;
use FreshPorts::vuxml_parsing;
use FreshPorts::vuxml_mark_commits;

my $PSQL = "/usr/local/bin/psql";

my $filename;
my $dryrun;
my $showchecksums;
my $printnodes;
my $showreasons;

my $nodeString;
my $NumUpdates = 0;

# From https://perldoc.perl.org/perlunifaq.html#What-is-a-%22wide-character%22%3f
# to handle: Wide character in print at /usr/local/lib/perl5/site_perl/FreshPorts/vuxml_parsing.pm line 234, <> chunk 1.\n

sub populate_vuxml_import($;$)
{
    my $dbh = shift;
    my $doc = shift;

    print "There, the parsefile has completed\n";

    # UNLINK = false is for debugging purposes
    my $fh_vid    = File::Temp->new(UNLINK => 0, PERMS => 0644);
    my $fname_vid = $fh_vid->filename;
    
    my $fh = IO::String->new();
#   my $vuxml = FreshPorts::vuxml->new( $dbh );


    print "calculating the sha256 for each vuln\n";
    eval {
        for my $node ($doc->findnodes('/vuxml/vuln'))
        {
            if ($dryrun && !$showchecksums) {
                print '.';
            }
            my $vid = $node->getAttributeNode('vid')->getValue();
            my $cancelled = $node->getElementsByTagName('cancelled');
            if ($cancelled->getLength() > 0) {
                print "cancelled is $vid - skipping that one for import\n";
                next;
            }

            my $csum = sha256_hex(Encode::encode_utf8($node->toString));

            # write vid and checksum to the file
            print $fh_vid "$vid\t$csum\n";
        } # for my $node
    }; # eval

    print "all sha256 saved for each vuln\n";

    $fh_vid->close();

    print 'finished with eval()' . "\n";

    # if something went wrong in the eval, abort and don't do a commit
    if ($@) {
        print "We've got a problem.";
        print "$0: $@\n";
        FreshPorts::CommitterOptIn::RecordErrorDetails("error processing vuxml", $0);
        die "$0: $@\n";
    }

    my $file_path    = "$fname_vid";
    my $table_name   = "vuxml_import";
    my $copy_sql     = "COPY $table_name FROM '$file_path' WITH (FORMAT TEXT, DELIMITER '\t', HEADER false)";
    my $truncate_sql = "TRUNCATE vuxml_import";

    # our goal is this:
    # echo "\\\copy vuxml_import from '/tmp/vnhvGB2yN0' WITH (FORMAT TEXT, HEADER false);" | psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl"

    # sslcertmode=disable avoids could not open certificate file “/root/.postgresql/postgresql.crt”: Permission denied
    my $psql_command = "$PSQL \"sslmode=$FreshPorts::Config::ssl_mode host=$FreshPorts::Config::host user=$FreshPorts::Config::user dbname=$FreshPorts::Config::dbname sslcertmode=disable\"";
    my $copy_command = "\\copy vuxml_import from '$file_path' WITH (FORMAT TEXT, HEADER false);";

    print "\$copy_command='$copy_command\n";
    print "\$psql_command='$psql_command\n";

    # save it to the database
    eval {
        $dbh->do($truncate_sql);
        print "committing truncate\n";
        $dbh->commit();


        print "echo \"$copy_command\" | $psql_command";
        print "\n";

        system("echo \"$copy_command;\" | $psql_command");
        print "copy command has finished\n";
    };
    if ($@) {
        warn "Error during COPY: $@";
        $dbh->rollback(); # Rollback on error
    }
} # MAIN

sub remove_deleted_vids($)
{
    my $dbh = shift;

    my $DeleteMissingVuxml = "select DeleteMissingVuxml()";
    # invoke the stored procedure: DeleteMissingVuxml()
    eval {
        $dbh->do($DeleteMissingVuxml);
        print "deleting missing vuxml\n";
        $dbh->commit();

        print "missing vuxml have been deleted\n";
    };
    if ($@) {
        warn "Error during delete: $@";
        $dbh->rollback(); # Rollback on error
    }
}


sub get_list_of_modified_and_new_vids($)
{
    # invoke the stored procedure: GetListOfVulnsForUpdate() which returns a list of new vulns
    # and vulns with new checksums
    my $dbh = shift;

    my %ListOfVIDs;
    my @row;

    my $VidsToUpdate = "select * from GetListOfVulnsForUpdate()";
    # invoke the stored procedure: GetListOfVulnsForUpdate()
    eval {
        my $sth = $dbh->prepare($VidsToUpdate);
        $sth->execute || FreshPorts::Utilities::ReportError('warning', "Could not execute SQL statement\n--$VidsToUpdate--\n... maybe invalid?", 1);
        print "getting list of vulns to update\n";

        while (@row = $sth->fetchrow_array()) {
            print "found vid: $row[0]\n";

            $ListOfVIDs{$row[0]} = $row[0];
        }

        $sth->finish();
        $dbh->commit();

        print "list of vulns to update has been obtained\n";
    };
    if ($@) {
        warn "Error during list of vulns: $@";
        $dbh->rollback(); # Rollback on error
    }

    return %ListOfVIDs;
}

sub update_modified_vids($;$;$)
# update_modified_vids($dbh, $doc, \%VIDsToUpdate);
{
    # given a hash/array of vids, go through the vuln.xml files and
    # save each of them to the database.
    my $dbh     = shift;
    my $doc     = shift;
    my $VIDsRef = shift;

#    my $fh;
	my %VIDsToUpdate = %{$VIDsRef};
	my %ProcessedVIDs;
	my $nodeString;

    # this code based on the existing process_vuxml.pl script
    my $fh = IO::String->new();

    for my $node ($doc->findnodes('/vuxml/vuln')) {
        my $vid = $node->getAttributeNode('vid')->getValue();
#        print "found vid: $vid\n";
        my $cancelled = $node->getElementsByTagName('cancelled');
        if ($cancelled->getLength() > 0) {
            print "cancelled is $vid - skipping that one again\n";
            next;
        }

        # if this vid is to be updated
        if ($VIDsToUpdate{$vid}) {
            print "found vid to update: $vid\n";
            # Keep track of what we have processed.
            # We use this to be sure we've processed everything.
            $ProcessedVIDs{$vid} = $vid;

            # Yes, we already calculated this the first time around
            # but we aren't storing it. Yet.
#            my $nodeString = $node->toString();

#            my $csum = sha256_hex(Encode::encode_utf8($nodeString));
            my $csum = sha256_hex(Encode::encode_utf8($node->toString));



#           my $csum = sha256_hex(Encode::encode_utf8($node->toString));

            $nodeString = $node->toString();

            print "This is that node:\n$nodeString\n";
            print "with csum=$csum\n";

            $fh->open($nodeString) || die("FATAL: $0 could not \$fh->open" );

            print "Getting a new vuxml_parsing\n";
            my $p = FreshPorts::vuxml_parsing->new(Stream        => $fh,
                                                   DBHandle      => $dbh,
                                                   UpdateInPlace => 1);

            print "parsing the xml and setting \$csum\n";
            # always pass in the checksum from our calculation
            $p->parse_xml($csum);

            if ($p->database_updated()) {
                print "yes, the database was updated for $vid\n";
                $NumUpdates++;
            } else {
                print "no, the database was NOT updated for $vid\n";
                next;
            }

            $fh->close;

            print 'invoking vuxml_mark_commits with ' . $vid . "\n";
            my $CommitMarker = FreshPorts::vuxml_mark_commits->new(DBHandle => $dbh,
                                                                   vid      => $vid);

            print 'invoking ProcessEachRangeRecord'. "\n";
            my $i = $CommitMarker->ProcessEachRangeRecord();

            print 'invoking ClearCachedEntries' . "\n";
            $CommitMarker->ClearCachedEntries($vid);

#            # putting this in here while debugging
#            last;
        }
    }

    print "reconciling processed against our list\n";
    # delete the ones we processed from the list of what we were looking for
    foreach my $key (keys %ProcessedVIDs) {
        delete $VIDsToUpdate{$key};
    }

    print "Checking to see if we missed anything.\n";
    # what do we have left?
    if (%VIDsToUpdate) {
        foreach my $key (keys %VIDsToUpdate) {
            print "Was unable to locate $key within the vuln.xml files.\n";
        }
        print "FATAL: we did not find everything we expected.\n";
    } else {
        print "We found and updated everything we expected.\n";
    }

}


binmode STDOUT, ":encoding(UTF-8)";

#use feature qw(switch);

print 'process_vuxml.pl starts' . "\n";


# Reads vuln.xml on stdin * NOT ANY MORE
GetOptions ('filename:s'     => \$filename,
            'dryrun!'        => \$dryrun,
            'showchecksums!' => \$showchecksums,
            'printnodes!'    => \$printnodes,
            'showreasons!'   => \$showreasons);

if ($dryrun) {
  print "this is a dry run\n";
}

if ($showchecksums) {
  print "checksums will be displayed\n";
}

if ($printnodes) {
  print "nodes will be displayed\n";
}

if ($showreasons) {
  print "reasons will be displayed\n";
}

my $start = time;

print '(there is usually a delay before further output)' . "\n";

my $dbh = FreshPorts::Database::GetDBHandle() || die("FATAL: $0 could not get a db handle" );

my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ($filename);

# save the incoming vuln.xml file to a staging database table
populate_vuxml_import($dbh, $doc);

remove_deleted_vids($dbh);

#
# NOTE inserts and updates are handled the same way
# delete existing vid (if any)
# insert
#
# Instead, we should collect new and modified together and do one traversal of the vuln.xlm document
#

my %VIDsToUpdate = get_list_of_modified_and_new_vids($dbh);

print "invoking update_modified_vids()\n";
update_modified_vids($dbh, $doc, \%VIDsToUpdate);

# added after seeing it at https://metacpan.org/dist/XML-DOM/view/lib/XML/DOM/Parser.pod
$doc->dispose;

my $end = time();

print "Total time: " . ($end - $start) . " seconds\n";

print "Number of updates: $NumUpdates\n";

if ($dryrun) {
  print "this was a dry run\n";
}

print "committing\n";
$dbh->commit();

$dbh->disconnect();

undef $dbh;

#
# That's All Folks!
#

print 'process_vuxml.pl finishes' . "\n";

The database code:

[20:22 pg01 dvl ~/src/freshports-database-schema] % cat updates-2025-08-01-vuxml_import.sql
-- for the COPY from files

grant truncate on vuxml_import to commits;


-- re https://news.freshports.org/2025/07/30/how-freshports-processes-vuxml-entries/
--    https://github.com/FreshPorts/freshports/issues/633
--
-- Table: public.vuxml_import

-- DROP TABLE IF EXISTS public.vuxml_import;

CREATE TABLE IF NOT EXISTS public.vuxml_import
(
    vid text COLLATE pg_catalog."default" NOT NULL,
    checksum text COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT vuxml_import_vid PRIMARY KEY (vid)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.vuxml_import
    OWNER to postgres;

REVOKE ALL ON TABLE public.vuxml_import FROM commits;
REVOKE ALL ON TABLE public.vuxml_import FROM rsyncer;

GRANT INSERT, DELETE, SELECT, UPDATE ON TABLE public.vuxml_import TO commits;

GRANT ALL ON TABLE public.vuxml_import TO postgres;

GRANT SELECT ON TABLE public.vuxml_import TO rsyncer;

-- DROP INDEX IF EXISTS public.vuxml_import_vid;

CREATE INDEX IF NOT EXISTS vuxml_import_vid
    ON public.vuxml_import USING btree
    (vid COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;


grant truncate on vuxml_import to commits;

-- delete the deleted vids.

CREATE OR REPLACE FUNCTION DeleteMissingVuxml() returns void AS $$

-- if it's not in vuxml_import, remove it from vuxml

WITH deleted_vids AS (
SELECT V.vid, V.checksum
  FROM vuxml V
LEFT OUTER JOIN vuxml_import vi
ON V.vid = vi.vid
WHERE vi.vid IS NULL
)
DELETE from vuxml V
USING deleted_vids as DV
WHERE V.vid = DV.vid;

$$ LANGUAGE SQL VOLATILE;

CREATE OR REPLACE FUNCTION GetListOfVulnsForUpdate() returns SETOF text AS $$

-- return a list of vid for updating

SELECT vi.vid
  FROM vuxml_import vi
LEFT OUTER JOIN vuxml
ON vuxml.vid = vi.vid
WHERE vuxml.checksum <> vi.checksum
union
SELECT vi.vid
  FROM vuxml_import vi
LEFT OUTER JOIN vuxml
ON vuxml.vid = vi.vid
WHERE vuxml.vid IS NULL;

$$ LANGUAGE SQL STABLE;

What’s next?

Start using this code on my host. That could just be a cp over the existing file.

Not today. Today we’re going to a wedding. Perhaps tomorrow.

The downside

If we do an import, and the import fails to populate vuxml_import with an accurate list of vulns, we can wind up deleting all vulns from the system. That is because the “delete missing vulns” process removes any vulns not found in vuxml_import. It is easy to reload the data, yet it is time consuming – hours, not minutes, to do that.

2025-08-14: For the record, processing all vuxml looks like this at the end:

reconciling processed against our list
Checking to see if we missed anything.
We found and updated everything we expected.
Total time: 59167 seconds
Number of updates: 5997
committing
process_vuxml.pl finishes

Or about 16.5 hours…

And, about 20 minutes later, I realized none of that data has been saved. I must be missing a commit or have an extra rollback.

I just started another run. I’ll check it tomorrow morning.

2028-08-15: Success. This is what it looks like:

reconciling processed against our list
Checking to see if we missed anything.
We found and updated everything we expected.
Total time: 63769 seconds
Number of updates: 5998
committing
process_vuxml.pl finishes
$ 

The “Number of updates” differs because of a recent commit.

This looks good. Especially given:

freshports.dvl=# select count(*) from vuxml;
 count 
-------
  5999
(1 row)

freshports.dvl=# 

That number differs again because I ran a one-off (process just one vuln) yesterday to verify the database commit was occurring and saving the work.

This is the CPU load on the FreeBSD jail (dvl-ingress01):

CPU load provided by OSLV plugin via LibreNMS
CPU load provided by OSLV plugin via LibreNMS

The Full Monty

Let’s run it again, and save the output for future reference.

$ perl ./process_vuxml_new.pl --filename=/jails/freshports/usr/ports/security/vuxml/vuln.xml
process_vuxml.pl starts
(there is usually a delay before further output)
There, the parsefile has completed
Error in tempfile() using template freshports_vuxml_processing_XXXXX: Could not create temp file freshports_vuxml_processing_3wWXJ: Permission denied at ./process_vuxml_new.pl line 62.
$ perl ./process_vuxml_new.pl --filename=/jails/freshports/usr/ports/security/vuxml/vuln.xml
process_vuxml.pl starts
(there is usually a delay before further output)
There, the parsefile has completed
calculating the sha256 for each vuln
doing the findnodes() thing
findnodes() has finished, iteration has started
..............................................................................................................................................................
.......................................................................................................
44101b31-8ffd-11e7-b5af-a4badb2f4699 - cancelled: skipping that one for import
....
c7e8e955-6c61-11e7-9b01-2047478f2f70 - cancelled: skipping that one for import
........
04bc4e23-9a70-42cb-9fec-3613632d34bc - cancelled: skipping that one for import
.........
2aedd15f-ca8b-11e6-a9a5-b499baebfeaf - cancelled: skipping that one for import
.....................
cf0b5668-4d1b-11e6-b2ec-b499baebfeaf - cancelled: skipping that one for import

61b8c359-4aab-11e6-a7bd-14dae9d210b8 - cancelled: skipping that one for import

c17fe91d-4aa6-11e6-a7bd-14dae9d210b8 - cancelled: skipping that one for import
.
42ecf370-4aa4-11e6-a7bd-14dae9d210b8 - cancelled: skipping that one for import
.......
95564990-1138-11e6-b55e-b499baebfeaf - cancelled: skipping that one for import
.........................................................................
f99a4686-e694-11e3-9032-000c2980a9f3 - cancelled: skipping that one for import
.....
080c5370-886a-11e3-9533-60a44c524f57 - cancelled: skipping that one for import
....
cd082cc6-1548-4b8d-a3aa-a007be611a29 - cancelled: skipping that one for import
.........................................................
6887828f-0229-11e0-b84d-00262d5ed8ee - cancelled: skipping that one for import

b2a6fc0e-070f-11e0-a6e9-00215c6a37bb - cancelled: skipping that one for import
..........
f6b6beaa-4e0e-11df-83fb-0015587e2cc1 - cancelled: skipping that one for import
................................................
ff284bf0-3f32-11dc-a79a-0016179b2dd5 - cancelled: skipping that one for import
................
28ce7102-4039-11db-a838-00148584c7dd - cancelled: skipping that one for import
...
0a4cd819-0291-11db-bbf7-000c6ec775d9 - cancelled: skipping that one for import
...
af2a60ed-da3e-11da-93e0-00123ffe8333 - cancelled: skipping that one for import
.....................................
e8c6ade2-6bcc-11d9-8e6f-000a95bc6fae - cancelled: skipping that one for import
....
14e8f315-600e-11d9-a9e7-0001020eed82 - cancelled: skipping that one for import
......
1f8dea68-3436-11d9-952f-000c6e8f12ef - cancelled: skipping that one for import
...........
bef4515b-eaa9-11d8-9440-000347a4fa7d - cancelled: skipping that one for import
..
4764cfd6-d630-11d8-b479-02e0185c0b53 - cancelled: skipping that one for import
......
c7705712-92e6-11d8-8b29-0020ed76ef5a - cancelled: skipping that one for import
......
6fd9a1e9-efd3-11d8-9837-000c41e2cdad - cancelled: skipping that one for import

3362f2c1-8344-11d8-a41f-0020ed76ef5a - cancelled: skipping that one for import
all sha256 saved for each vuln
finished with eval()
$copy_command='\copy vuxml_import from '/tmp/freshports_vuxml_processing_ixUmG' WITH (FORMAT TEXT, HEADER false);
$psql_command='/usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
committing truncate
echo "\copy vuxml_import from '/tmp/freshports_vuxml_processing_ixUmG' WITH (FORMAT TEXT, HEADER false);" | /usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
copy command has finished
$output is 'COPY 5999'
Imported 5999 items
deleting missing vuxml
vuln not present in xml files have been deleted from the database.
getting list of vulns to update
list of vulns to update has been obtained
We have 0 vuxml entries to add/update
invoking update_modified_vids()
Nothing to update
Total time: 22 seconds
Number of updates: 0
committing
process_vuxml.pl finishes
$ 

Success. Let’s ship that and see what happens during the next commit.

[11:50 dvl-ingress01 dvl ~/scripts] % cp process_vuxml_new.pl process_vuxml.pl
[11:50 dvl-ingress01 dvl ~/scripts] % diff -ruN process_vuxml.pl process_vuxml_new.pl
[11:50 dvl-ingress01 dvl ~/scripts] % 

psql \copy is silenting failing

I just started commit processing on dvl-ingress01. Immediately, the sanity checks failed:

FATAL: we imported  items, which is less than our expected minimum
value of 5900

FATAL: we were deleting way too may rows from vuxml: 5999 items, which
is more than our expected minimum value of 10

Both of those should have terminated processing. Neither did. That’s the first problem to solve.

Next, I have to figure out why running the script from the command line works, but launching it from another script fails.

What comes to mind first is PATH.

The logs show:

$copy_command='\copy vuxml_import from '/tmp/freshports_vuxml_processing_HX1m1' WITH (FORMAT TEXT, HEADER false);
$psql_command='/usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
committing truncate
echo "\copy vuxml_import from '/tmp/freshports_vuxml_processing_HX1m1' WITH (FORMAT TEXT, HEADER false);" | /usr/local/bin/psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl sslcertmode=disable"
copy command has finished
$output is ''
Imported  items
FATAL: we imported  items, which is less than our expected minimum value of 5900

This fix may have to wait until Monday. The weekend is full.

I found time

I’ve taken a long lunch today and meeting with an architect friend. He’s doing plans, I’m looking at this.

This runs to success, so…:

$ pwd
/usr/local/libexec/freshports
$ id
uid=10001(freshports) gid=10001(freshports) groups=10001(freshports)
$ touch ~/signals/vuxml && ./process_vuxml.sh
...

process_vuxml.sh will launch process_vuxml.pl, I still suspect paths or something odd.

This works as my regular user, in a zsh shell

[18:14 dvl-ingress01 dvl ~/scripts] % echo "\\\copy vuxml_import from '/tmp/freshports_vuxml_processing_jDIF3' WITH (FORMAT TEXT, HEADER false);" | psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl"
COPY 6001

As the freshports user in a bourne shell:

$ echo "\copy vuxml_import from '/tmp/freshports_vuxml_processing_jDIF3' WITH (FORMAT TEXT, HEADER false);" | psql "sslmode=require host=pg01.int.unixathome.org user=commits_dvl dbname=freshports.dvl"
COPY 6001

and this works too:

$ touch ~/signals/vuxml && ./process_vuxml.sh
$ touch ~/signals/vuxml && perl job-waiting.pl 

Both result in

copy command has finished
$output is 'COPY 6002'
Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top