What packages are now available for this port?

This post deals with adding a new feature: A table of which FreeBSD version and architecture has a given package

The goal: parse the packagesite.yaml supplied with each repo and load the results into the FreshPorts database.

A bunch of the background in the above URL, and I don’t want to repeat it here.

This post is mostly my notes on who this is being done.

What’s in the file

The file is labelled as .yaml, which is a superset of JSON. I first started parsing the file with Python but ran into encoding errors:

Things to be determined

There are a number of items, not so urgent, which must be resolved.

Starting with Python

I started off parsing packagesite.yaml with this Python script. This produced a tab-delimited file which could then be loaded into the database.

The creation of the CSV file took about 3 minutes.

Posting for help

Last night, I posted on social media looking for help:

The next morning brought many suggestions (thank you).

The most elegant solution was the use of textproc/jq (homepage).

The new working proof-of-concepts

This is the one-liner for creating the tab-delimited file:

$ time jq -rc '[1, .origin, .name, .version] | @tsv' < ~/tmp/FreeBSD\:12\:amd64/latest/packagesite.yaml > packagesite.csv

real0m1.351s
user0m1.295s
sys0m0.055s

That is plenty fast for processing a 31491-line file.

Loading the code into the database is still plenty fast as well:

$ time ./import-via-copy-packagesite.py

real	0m2.077s
user	0m0.126s
sys	0m0.016s

The database tables

The data is imported into this table:

freshports.dev=# \d packages_raw
                           Table "public.packages_raw"
     Column      |  Type   | Collation | Nullable |           Default            
-----------------+---------+-----------+----------+------------------------------
 id              | bigint  |           | not null | generated always as identity
 abi_id          | integer |           | not null | 
 package_origin  | text    |           | not null | 
 package_name    | text    |           | not null | 
 package_version | text    |           | not null | 
Indexes:
    "packages_raw_pkey" PRIMARY KEY, btree (id)
    "fki_packages_raw_abi_id_fk" btree (abi_id)
    "packages_raw_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_raw_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID

freshports.dev=# 

That data looks like this:

freshports.dev=# SELECT * FROM packages_raw ORDER BY package_name LIMIT 5;
   id   | abi_id |      package_origin       |   package_name   | package_version 
--------+--------+---------------------------+------------------+-----------------
 125562 |      1 | games/0ad                 | 0ad              | 0.0.23b_9
 124778 |      1 | security/0d1n             | 0d1n             | 2.3
 124774 |      1 | games/0verkill            | 0verkill         | 0.16_2
 124768 |      1 | security/1password-client | 1password-client | 0.9.3
 124760 |      1 | games/2048                | 2048             | 0.9.1
(5 rows)

freshports.dev=# 

That just allows me to run queries on the data before copying into this table:

freshports.dev=# \d packages
                           Table "public.packages"
    Column    |  Type   | Collation | Nullable |           Default            
--------------+---------+-----------+----------+------------------------------
 id           | bigint  |           | not null | generated always as identity
 abi_id       | integer |           | not null | 
 port_id      | integer |           | not null | 
 version      | text    |           | not null | 
 package_name | text    |           | not null | 
Indexes:
    "packages_pkey" PRIMARY KEY, btree (id)
    "fki_packages_abi_id_fk" btree (abi_id)
    "fki_packages_port_id_fk" btree (port_id)
    "packages_package_name_idx" btree (package_name)
Foreign-key constraints:
    "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
    "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID

freshports.dev=# 

That data looks like this:

freshports.dev=# SELECT * FROM packages ORDER BY package_name LIMIT 5;
  id   | abi_id | port_id |  version  |   package_name   
-------+--------+---------+-----------+------------------
 63781 |      1 |   31186 | 0.0.23b_9 | 0ad
 63000 |      1 |   41577 | 2.3       | 0d1n
 62996 |      1 |    5241 | 0.16_2    | 0verkill
 62990 |      1 |   45139 | 0.9.3     | 1password-client
 62982 |      1 |   35082 | 0.9.1     | 2048
(5 rows)

freshports.dev=# 

Parsing the raw data into relational data

How does data get from packages_raw into packages?

Via this query:

freshports.dev=# EXPLAIN ANALYSE
freshports.dev-# INSERT INTO packages(abi_id, port_id, package_name, version)
freshports.dev-# SELECT abi_id,
freshports.dev-#        GetPort(package_origin),
freshports.dev-#        package_name,
freshports.dev-#        package_version
freshports.dev-#   FROM packages_raw
freshports.dev-#  WHERE GetPort(package_origin) IS NOT NULL;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Insert on packages  (cost=0.00..17084.83 rows=31334 width=45) (actual time=34310.269..34310.269 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..17084.83 rows=31334 width=45) (actual time=13.996..33835.542 rows=31473 loops=1)
         ->  Seq Scan on packages_raw  (cost=0.00..16693.16 rows=31334 width=29) (actual time=13.957..33788.113 rows=31473 loops=1)
               Filter: (getport(package_origin) IS NOT NULL)
               Rows Removed by Filter: 18
 Planning Time: 0.271 ms
 Trigger for constraint packages_abi_id_fk: time=354.821 calls=31473
 Trigger for constraint packages_port_id_fk: time=2243.644 calls=31473
 Execution Time: 36916.031 ms
(9 rows)

freshports.dev=# 

Some explaining:

  • GetPort(package_origin) converts package_origin (e.g. games/0ad) into a port id – if there is no matching port, that’s something to deal with later. Why no matching port? In my dev database, it can easily occur.
  • There might be a better way to do this, but so far so good. Suggestions welcome.

What did not get converted?

This query helps me find out what data is not being copied from the packages_raw table to the packages table:

freshports.dev=# SELECT PR.* 
  FROM packages_raw PR
 WHERE NOT EXISTS (SELECT * FROM packages P where PR.package_name = P.package_name);
   id   | abi_id |            package_origin            |          package_name          | package_version 
--------+--------+--------------------------------------+--------------------------------+-----------------
  96128 |      1 | sysutils/swapusage                   | swapusage                      | 1.0.0
 100602 |      1 | devel/p5-MooX-Singleton              | p5-MooX-Singleton              | 1.20
 101008 |      1 | devel/py-aspy.yaml                   | py37-aspy.yaml                 | 1.3.0
 101036 |      1 | devel/py-nodeenv                     | py37-nodeenv                   | 1.3.3
 101891 |      1 | devel/rubygem-peek-rblineprof-rails5 | rubygem-peek-rblineprof-rails5 | 0.2.0
 101953 |      1 | devel/rubygem-peek-gc-rails5         | rubygem-peek-gc-rails5         | 0.0.2
 103834 |      1 | devel/py-nodeenv                     | py27-nodeenv                   | 1.3.3
 104220 |      1 | sysutils/py-leviathan                | py37-leviathan                 | 0.1.1
 107173 |      1 | sysutils/atf-rk3399                  | atf-rk3399                     | v2.1
 117527 |      1 | security/picosha2                    | picosha2                       | 1.0.0.23
  99899 |      1 | devel/p5-File-ShareDir-Tarball       | p5-File-ShareDir-Tarball       | 0.2.2
 100063 |      1 | devel/p5-MooseX-MungeHas             | p5-MooseX-MungeHas             | 0.011
 100562 |      1 | devel/py-aspy.yaml                   | py27-aspy.yaml                 | 1.3.0
 101881 |      1 | devel/rubygem-peek-redis-rails5      | rubygem-peek-redis-rails5      | 1.2.0
 101917 |      1 | devel/rubygem-peek-rails5            | rubygem-peek-rails5            | 1.0.1
 101941 |      1 | databases/rubygem-peek-mysql2-rails5 | rubygem-peek-mysql2-rails5     | 1.2.0
 106471 |      1 | sysutils/py-leviathan                | py27-leviathan                 | 0.1.1
 107871 |      1 | www/rubygem-gon-rails5               | rubygem-gon-rails5             | 6.2.1
(18 rows)

freshports.dev=# 

That query runs quickly, very quickly. There are indexes on package_name in both tables.

What’s next?

Next up is the display of this data on each port page. That’s where port_id comes into play.

freshports.dev-#        PKG.package_name,
freshports.dev-#        PKG.version
freshports.dev-#   FROM packages PKG, ABI
freshports.dev-#  WHERE PKG.port_id = 31186
freshports.dev-#    AND PKG.abi_id  = ABI.id;
       name       | package_name |  version  
------------------+--------------+-----------
 FreeBSD:12:amd64 | 0ad          | 0.0.23b_9
(1 row)

freshports.dev=# 

Hmmm, I should rename packages.version to package.package_version for consistency with packages_raw.

packages_raw is designed to hold multiple entries for a given port, for the different ABI (application binary interface). Have a look at https://pkg.freebsd.org to see the various packages sets.

Example screen shot from security/acme.sh

acme.sh versions
acme.sh versions
Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top