Since the last past, I’ve added quarterly packages to the system.
I want to outline the changes while I remember the.
I will start by outline the tables we have.
abi
The abi table contains the usual suspects. Entries in this table are added manually, when a new release comes out.
See https://pkg.freebsd.org for examples of where these values are obtained from.
freshports.dev=# SELECT * FROM abi ORDER BY name;
id | name | active | last_checked | last_updated
----+--------------------+--------+-------------------------------+-------------------------------
7 | FreeBSD:11:aarch64 | t | 2020-04-10 21:51:28.219155+00 | 2020-04-10 21:51:28.219155+00
6 | FreeBSD:11:amd64 | t | 2020-04-10 21:51:28.107421+00 | 2020-04-10 21:51:28.107421+00
4 | FreeBSD:11:i386 | t | 2020-04-10 21:05:01.431882+00 | 2020-04-10 21:05:01.431882+00
9 | FreeBSD:12:aarch64 | t | 2020-04-10 21:51:28.361931+00 | 2020-04-10 21:51:28.361931+00
1 | FreeBSD:12:amd64 | t | 2020-04-10 18:22:10.545416+00 | 2020-04-10 18:24:56.829453+00
8 | FreeBSD:12:i386 | t | 2020-04-10 21:51:28.278548+00 | 2020-04-10 21:51:28.278548+00
2 | FreeBSD:13:aarch64 | t | 2020-04-10 18:39:12.554194+00 | 2020-04-10 18:39:12.554194+00
11 | FreeBSD:13:amd64 | t | 2020-04-10 21:51:28.479757+00 | 2020-04-10 21:51:28.479757+00
10 | FreeBSD:13:i386 | t | 2020-04-10 21:51:28.421228+00 | 2020-04-10 21:51:28.421228+00
(9 rows)
freshports.dev=# \d abi
Table "public.abi"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+------------------------------
id | integer | | not null | generated always as identity
name | text | | not null |
active | boolean | | not null | true
last_checked | timestamp with time zone | | not null | CURRENT_TIMESTAMP
last_updated | timestamp with time zone | | not null | CURRENT_TIMESTAMP
Indexes:
"abi_pkey" PRIMARY KEY, btree (id)
"abi_name_idx" UNIQUE, btree (name)
Referenced by:
TABLE "packages" CONSTRAINT "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
TABLE "packages_last_checked" CONSTRAINT "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)
TABLE "packages_raw_original" CONSTRAINT "packages_raw_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
freshports.dev=#
The last_checked and last_updated columns are now deprecated. You will see their respective replacements in the packages_last_checked table.
packages
The packages contains the latest package version available for a given ABI on a given branch (e.g. latest, quarterly, 2020Q2).
freshports.dev=# SELECT * FROM packages ORDER BY package_name LIMIT 10;
id | abi_id | port_id | package_version | package_name | branch_id
--------+--------+---------+-----------------+--------------+-----------
186113 | 4 | 31186 | 0.0.23b_9 | 0ad | 1
204653 | 6 | 31186 | 0.0.23b_9 | 0ad | 1
278713 | 8 | 31186 | 0.0.23b_9 | 0ad | 1
371391 | 10 | 31186 | 0.0.23b_9 | 0ad | 1
403274 | 11 | 31186 | 0.0.23b_9 | 0ad | 1
495736 | 1 | 31186 | 0.0.23b_9 | 0ad | 1
590825 | 1 | 31186 | 0.0.23b_9 | 0ad | 142
622854 | 4 | 31186 | 0.0.23b_9 | 0ad | 142
653983 | 6 | 31186 | 0.0.23b_9 | 0ad | 142
712933 | 8 | 31186 | 0.0.23b_9 | 0ad | 142
(10 rows)
freshports.dev=# \d packages
Table "public.packages"
Column | Type | Collation | Nullable | Default
-----------------+---------+-----------+----------+------------------------------
id | bigint | | not null | generated always as identity
abi_id | integer | | not null |
port_id | integer | | not null |
package_version | text | | not null |
package_name | text | | not null |
branch_id | integer | | not null |
Indexes:
"packages_pkey" PRIMARY KEY, btree (id)
"fki_packages_abi_id_fk" btree (abi_id)
"fki_packages_port_id_fk" btree (port_id)
"packages_package_name_idx" btree (package_name)
Foreign-key constraints:
"packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID
"packages_branch_id_fk" FOREIGN KEY (branch_id) REFERENCES system_branch(id) NOT VALID
"packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID
packages_raw
The packages_raw table is a plain text version of the packages table. The import script loads data into this table before it is loaded into the packages table.
The loading from the temporary table (packages_raw) ito the permanent table (packages) is easily done with a single database query. Doing the same work within an external program would be non-trivial, slower, and error-prone.
The conversion process
This query takes the raw data from the temporary table (packages_raw) into the permanent table (packages).
INSERT INTO packages(abi_id, branch_id, port_id, package_name, package_version)
SELECT GetABIId(abi),
GetBranchId(branch),
GetPort(package_origin),
package_name,
package_version
FROM packages_raw
WHERE GetPort(package_origin) IS NOT NULL
AND GetBranchId(branch) IS NOT NULL
AND GetABIId(abi) IS NOT NULL ;
At present, this query does all of the data and has no filtering (e.g. by ABI or branch).
packages_last_checked
The packages_last_checked table keeps track of work done or to be done.
freshports.dev=# SELECT * FROM packages_last_checked;
abi_id | branch_id | last_checked | repo_date | import_date
--------+-----------+-------------------------------+------------------------+-------------
7 | 1 | 2020-04-12 16:56:25.824973+00 | 2018-09-29 04:24:00+00 |
7 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-07 12:03:00+00 |
6 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 00:16:00+00 |
6 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:20:00+00 |
4 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 18:43:00+00 |
4 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 02:36:00+00 |
9 | 1 | 2020-04-12 16:56:25.824973+00 | 2018-10-11 05:47:00+00 |
9 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-02 05:09:00+00 |
1 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 01:18:00+00 |
1 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:56:00+00 |
8 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 19:12:00+00 |
8 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:36:00+00 |
2 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-01-04 12:39:00+00 |
2 | 142 | 2020-04-12 16:56:25.824973+00 | |
11 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 02:22:00+00 |
11 | 142 | 2020-04-12 16:56:25.824973+00 | |
10 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 23:14:00+00 |
10 | 142 | 2020-04-12 16:56:25.824973+00 | |
(18 rows)
freshports.dev=# \d packages_last_checked
Table "public.packages_last_checked"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+---------
abi_id | integer | | not null |
branch_id | integer | | not null |
last_checked | timestamp with time zone | | |
repo_date | timestamp with time zone | | |
import_date | timestamp with time zone | | |
Indexes:
"packages_last_checked_pkey" PRIMARY KEY, btree (abi_id, branch_id)
"packages_last_checked_branch_id_idx" btree (branch_id)
Foreign-key constraints:
"packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id)
freshports.dev=#
When the repo_date column is empty, no repo was found for this ABI & branch combination.
Note the import_date column is empty. The script which populates the packages table does not yet update that value.
The check_repos_for_new_stuff.py script updates the last_checked and repo_date columns.
To find repos which need to be updated, use this query:
SELECT * FROM packages_last_checked WHERE repo_date > import_date OR (repo_date IS NOT NULL AND import_date IS NULL);
What’s next?
The import-raw-abi.sh script is the basis of future work. The pseudo code is similar to this:
SELECT ABI, branch values which need updating
FOR EACH combination
DO
fetch https://pkg.freebsd.org/$abi/$branch/packagesite.txz
unxz packagesite.txz
tar -xf packagesite.tar
ls -l ./packagesite.yaml
jq -rc --arg ABI "$abi" --arg BRANCH "$branch" '[$ABI, $BRANCH, .origin, .name, .version] | @tsv' < ./packagesite.yaml > packagesite.csv
~/bin/import-via-copy-packagesite-all-raw-fields.py -i packagesite.csv
Update the packages_last_checked.import_date for this combination
DONE
NOTES on this:
- The launching of this script can be trigger based. When an update occurs to the packages_last_checked, observe if new data is detected, and set work-to-be-done accordingly.
- I now think that import and conversion from packages_raw to packages can be done separately.
- Add another column to the packages_last_checked, call it import_processed.
- When you find a column with import_date > import_processed, that indicates you need to run the conversion process (see a previous section).











