Since the last past, I’ve added quarterly packages to the system.
I want to outline the changes while I remember the.
I will start by outline the tables we have.
abi
The abi table contains the usual suspects. Entries in this table are added manually, when a new release comes out.
See https://pkg.freebsd.org for examples of where these values are obtained from.
freshports.dev=# SELECT * FROM abi ORDER BY name; id | name | active | last_checked | last_updated ----+--------------------+--------+-------------------------------+------------------------------- 7 | FreeBSD:11:aarch64 | t | 2020-04-10 21:51:28.219155+00 | 2020-04-10 21:51:28.219155+00 6 | FreeBSD:11:amd64 | t | 2020-04-10 21:51:28.107421+00 | 2020-04-10 21:51:28.107421+00 4 | FreeBSD:11:i386 | t | 2020-04-10 21:05:01.431882+00 | 2020-04-10 21:05:01.431882+00 9 | FreeBSD:12:aarch64 | t | 2020-04-10 21:51:28.361931+00 | 2020-04-10 21:51:28.361931+00 1 | FreeBSD:12:amd64 | t | 2020-04-10 18:22:10.545416+00 | 2020-04-10 18:24:56.829453+00 8 | FreeBSD:12:i386 | t | 2020-04-10 21:51:28.278548+00 | 2020-04-10 21:51:28.278548+00 2 | FreeBSD:13:aarch64 | t | 2020-04-10 18:39:12.554194+00 | 2020-04-10 18:39:12.554194+00 11 | FreeBSD:13:amd64 | t | 2020-04-10 21:51:28.479757+00 | 2020-04-10 21:51:28.479757+00 10 | FreeBSD:13:i386 | t | 2020-04-10 21:51:28.421228+00 | 2020-04-10 21:51:28.421228+00 (9 rows) freshports.dev=# \d abi Table "public.abi" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+------------------------------ id | integer | | not null | generated always as identity name | text | | not null | active | boolean | | not null | true last_checked | timestamp with time zone | | not null | CURRENT_TIMESTAMP last_updated | timestamp with time zone | | not null | CURRENT_TIMESTAMP Indexes: "abi_pkey" PRIMARY KEY, btree (id) "abi_name_idx" UNIQUE, btree (name) Referenced by: TABLE "packages" CONSTRAINT "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID TABLE "packages_last_checked" CONSTRAINT "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) TABLE "packages_raw_original" CONSTRAINT "packages_raw_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID freshports.dev=#
The last_checked and last_updated columns are now deprecated. You will see their respective replacements in the packages_last_checked table.
packages
The packages contains the latest package version available for a given ABI on a given branch (e.g. latest, quarterly, 2020Q2).
freshports.dev=# SELECT * FROM packages ORDER BY package_name LIMIT 10; id | abi_id | port_id | package_version | package_name | branch_id --------+--------+---------+-----------------+--------------+----------- 186113 | 4 | 31186 | 0.0.23b_9 | 0ad | 1 204653 | 6 | 31186 | 0.0.23b_9 | 0ad | 1 278713 | 8 | 31186 | 0.0.23b_9 | 0ad | 1 371391 | 10 | 31186 | 0.0.23b_9 | 0ad | 1 403274 | 11 | 31186 | 0.0.23b_9 | 0ad | 1 495736 | 1 | 31186 | 0.0.23b_9 | 0ad | 1 590825 | 1 | 31186 | 0.0.23b_9 | 0ad | 142 622854 | 4 | 31186 | 0.0.23b_9 | 0ad | 142 653983 | 6 | 31186 | 0.0.23b_9 | 0ad | 142 712933 | 8 | 31186 | 0.0.23b_9 | 0ad | 142 (10 rows) freshports.dev=# \d packages Table "public.packages" Column | Type | Collation | Nullable | Default -----------------+---------+-----------+----------+------------------------------ id | bigint | | not null | generated always as identity abi_id | integer | | not null | port_id | integer | | not null | package_version | text | | not null | package_name | text | | not null | branch_id | integer | | not null | Indexes: "packages_pkey" PRIMARY KEY, btree (id) "fki_packages_abi_id_fk" btree (abi_id) "fki_packages_port_id_fk" btree (port_id) "packages_package_name_idx" btree (package_name) Foreign-key constraints: "packages_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) NOT VALID "packages_branch_id_fk" FOREIGN KEY (branch_id) REFERENCES system_branch(id) NOT VALID "packages_port_id_fk" FOREIGN KEY (port_id) REFERENCES ports(id) NOT VALID
packages_raw
The packages_raw table is a plain text version of the packages table. The import script loads data into this table before it is loaded into the packages table.
The loading from the temporary table (packages_raw) ito the permanent table (packages) is easily done with a single database query. Doing the same work within an external program would be non-trivial, slower, and error-prone.
The conversion process
This query takes the raw data from the temporary table (packages_raw) into the permanent table (packages).
INSERT INTO packages(abi_id, branch_id, port_id, package_name, package_version) SELECT GetABIId(abi), GetBranchId(branch), GetPort(package_origin), package_name, package_version FROM packages_raw WHERE GetPort(package_origin) IS NOT NULL AND GetBranchId(branch) IS NOT NULL AND GetABIId(abi) IS NOT NULL ;
At present, this query does all of the data and has no filtering (e.g. by ABI or branch).
packages_last_checked
The packages_last_checked table keeps track of work done or to be done.
freshports.dev=# SELECT * FROM packages_last_checked; abi_id | branch_id | last_checked | repo_date | import_date --------+-----------+-------------------------------+------------------------+------------- 7 | 1 | 2020-04-12 16:56:25.824973+00 | 2018-09-29 04:24:00+00 | 7 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-07 12:03:00+00 | 6 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 00:16:00+00 | 6 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:20:00+00 | 4 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 18:43:00+00 | 4 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 02:36:00+00 | 9 | 1 | 2020-04-12 16:56:25.824973+00 | 2018-10-11 05:47:00+00 | 9 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-02 05:09:00+00 | 1 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 01:18:00+00 | 1 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:56:00+00 | 8 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 19:12:00+00 | 8 | 142 | 2020-04-12 16:56:25.824973+00 | 2020-04-12 03:36:00+00 | 2 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-01-04 12:39:00+00 | 2 | 142 | 2020-04-12 16:56:25.824973+00 | | 11 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-11 02:22:00+00 | 11 | 142 | 2020-04-12 16:56:25.824973+00 | | 10 | 1 | 2020-04-12 16:56:25.824973+00 | 2020-04-10 23:14:00+00 | 10 | 142 | 2020-04-12 16:56:25.824973+00 | | (18 rows) freshports.dev=# \d packages_last_checked Table "public.packages_last_checked" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+--------- abi_id | integer | | not null | branch_id | integer | | not null | last_checked | timestamp with time zone | | | repo_date | timestamp with time zone | | | import_date | timestamp with time zone | | | Indexes: "packages_last_checked_pkey" PRIMARY KEY, btree (abi_id, branch_id) "packages_last_checked_branch_id_idx" btree (branch_id) Foreign-key constraints: "packages_last_checked_abi_id_fk" FOREIGN KEY (abi_id) REFERENCES abi(id) freshports.dev=#
When the repo_date column is empty, no repo was found for this ABI & branch combination.
Note the import_date column is empty. The script which populates the packages table does not yet update that value.
The check_repos_for_new_stuff.py script updates the last_checked and repo_date columns.
To find repos which need to be updated, use this query:
SELECT * FROM packages_last_checked WHERE repo_date > import_date OR (repo_date IS NOT NULL AND import_date IS NULL);
What’s next?
The import-raw-abi.sh script is the basis of future work. The pseudo code is similar to this:
SELECT ABI, branch values which need updating FOR EACH combination DO fetch https://pkg.freebsd.org/$abi/$branch/packagesite.txz unxz packagesite.txz tar -xf packagesite.tar ls -l ./packagesite.yaml jq -rc --arg ABI "$abi" --arg BRANCH "$branch" '[$ABI, $BRANCH, .origin, .name, .version] | @tsv' < ./packagesite.yaml > packagesite.csv ~/bin/import-via-copy-packagesite-all-raw-fields.py -i packagesite.csv Update the packages_last_checked.import_date for this combination DONE
NOTES on this:
- The launching of this script can be trigger based. When an update occurs to the packages_last_checked, observe if new data is detected, and set work-to-be-done accordingly.
- I now think that import and conversion from packages_raw to packages can be done separately.
- Add another column to the packages_last_checked, call it import_processed.
- When you find a column with import_date > import_processed, that indicates you need to run the conversion process (see a previous section).