FreshPorts not only keeps track of changes to the FreeBSD ports tree, it also keeps track of when ports move around and any special notes regarding upgrading. This information is obtained from MOVED
and UPDATING
respectively.
Tonight I was chatting with Edwin Groothuis about MOVED
. We got to talking how FreshPorts parsed this file. In short, it does this:
EmptyMoved($dbh);
parsefile($dbh);
Yes, it deletes everything from the table and adds everything in from the file. That’s simple to code, but not very efficient when it comes to database IO. The table in question is the ports_moved
table:
freshports.org=# \d ports_moved Table "public.ports_moved" Column | Type | Modifiers --------------+---------+------------------------------------------------------------- id | integer | not null default nextval('public.ports_moved_id_seq'::text) from_port_id | integer | not null to_port_id | integer | date | date | not null reason | text | not null Indexes: "ports_moved_pkey" primary key, btree (id) Foreign-key constraints: "$2" FOREIGN KEY (to_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE "$1" FOREIGN KEY (from_port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE freshports.org=#
As you can see, there’s only four elements in there. It’s pretty simple to read. And only about 2000 rows at the time of writing.
It should be fairly each to modify the script to become more efficient. Here’s what I typed in the IRC channel with Edwin:
Should be easy enough. Read all the data first. Then then parsing the file, look in the cache to see what’s there. If it’s already there, update… and mark it in the cache as being processed. If not there, insert. Anything not marked in the cache after processing should be deleted from the table.
The same thing could be applied to the ports_updating
table:
freshports.org=# \d ports_updating Table "public.ports_updating" Column | Type | Modifiers ---------+---------+---------------------------------------------------------------- id | integer | not null default nextval('public.ports_updating_id_seq'::text) date | date | not null affects | text | not null author | text | reason | text | not null Indexes: "ports_updating_pkey" primary key, btree (id) freshports.org=#
Hmmm, I’m glad I wrote this. One day I’ll use this to speed things up a bit.