In this post I’ll talk about how the code intially took the easy way out and how recent changes have tightened that up. In short, the code would raise N messages when just a single notification is sufficient.
Today the full vuxml database was reloaded into FreshPorts. This affects approximately 3454 ports and the marking of 60,436 commits (or revisions) as vulnerable. When you look at a port page, you can see the skulls which indicate a known vuln beside the commits in question (look under Commit History). That makes up for 565,576 unique combinations of commits, ports, and vuxml entries.
After all that is processed (over about 21.5 hours on the current server), we need to flush the cache of for each item that changed (ports, daily summaries, etc).
How cache clearing works
When a port is updated, a trigger on the ports table adds an entry to the cache_clearing_ports. A long running daemon monitors the table waiting for a LISTEN notification. When it arrives, the daemon reads the table and starts processing the entries, clearing the cache.
This is simplified though. Updates occur in many places and each invokes its own trigger. However, for this post, we will deal with only one.
current cache_clearing_ports table
This is the current cache_clearing_ports table:
freshports.org=# \d cache_clearing_ports Table "public.cache_clearing_ports" Column | Type | Collation | Nullable | Default ------------+-----------------------------+-----------+----------+-------------------------------------------------- id | integer | | not null | nextval('cache_clearing_ports_id_seq'::regclass) port_id | integer | | not null | category | text | | not null | port | text | | not null | date_added | timestamp without time zone | | not null | now() Indexes: "cache_clearing_ports_pkey" PRIMARY KEY, btree (id) Foreign-key constraints: "cache_clearing_ports_port_id_fkey" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE freshports.org=#
As you can see, duplicate entries are OK. However, over a single commit, which is how the vuxml data is processed, this can result in a large number of duplicates.
How many duplicates?
Checking here, we have just over 1 million entries.
freshports.org=# select count(*) from cache_clearing_ports; count --------- 1005407 (1 row)
How unique are they?
Only 3500 ports:
freshports.org=# select count(distinct port_id) from commit_log_ports_vuxml; count ------- 3454 (1 row)
I recall when I was writing this: duplicates are no big deal. The cache gets cleared twice. So what?
It takes time to read through the cache_clearing_ports. A long time, when you have about 1 million entries.
The trigger
This is the existing trigger:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | CREATE OR REPLACE FUNCTION ports_clear_cache() RETURNS TRIGGER AS $$ DECLARE l_port text; l_category text; BEGIN IF TG_OP = 'UPDATE' THEN SELECT category, name INTO l_category, l_port FROM ports_all WHERE id = NEW.id; INSERT INTO cache_clearing_ports (port_id, category, port) VALUES (NEW.id, l_category, l_port); NOTIFY port_updated; END IF; -- when a port changes, add an entry to the cache clearing table RETURN NEW; END $$ LANGUAGE 'plpgsql'; DROP TRIGGER IF EXISTS ports_clear_cache ON ports; CREATE TRIGGER ports_clear_cache AFTER UPDATE ON ports FOR EACH ROW EXECUTE PROCEDURE ports_clear_cache(); |
This is how I propose to change the table. First, I’m creating a new identical table to play with:
freshports.dev=# create table cache_clearing_ports_test (like cache_clearing_ports INCLUDING ALL); CREATE TABLE
Just what I need. This creates a new table, including all the constraints etc, so I could play with it without affecting the ongoing database operation in my dev server.
This is the DDL I would run:
ALTER TABLE public.cache_clearing_ports_test ADD CONSTRAINT cache_clearing_ports_test_port_id_idx UNIQUE (port_id); COMMENT ON CONSTRAINT cache_clearing_ports_port_id_idx ON public.cache_clearing_ports_test IS 'Let''s store just one instance per port_id';
The DDL for the real table would be:
ALTER TABLE public.cache_clearing_ports ADD CONSTRAINT cache_clearing_ports_port_id_idx UNIQUE (port_id); COMMENT ON CONSTRAINT cache_clearing_ports_port_id_idx ON public.cache_clearing_ports IS 'Let''s store just one instance per port_id';
This is the amended test table:
freshports.dev=# \d cache_clearing_ports_test Table "public.cache_clearing_ports_test" Column | Type | Collation | Nullable | Default ------------+-----------------------------+-----------+----------+-------------------------------------------------- id | integer | | not null | nextval('cache_clearing_ports_id_seq'::regclass) port_id | integer | | not null | category | text | | not null | port | text | | not null | date_added | timestamp without time zone | | not null | now() Indexes: "cache_clearing_ports_test_pkey" PRIMARY KEY, btree (id) "cache_clearing_ports_port_id_idx" UNIQUE CONSTRAINT, btree (port_id) freshports.dev=#
Next step, fix the trigger, which will produce duplicate errors if left as it is.
The new trigger
The INSERT command has a conflict_target clause, which is very useful for this situation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | CREATE OR REPLACE FUNCTION ports_clear_cache() RETURNS TRIGGER AS $$ DECLARE l_port text; l_category text; BEGIN IF TG_OP = 'UPDATE' THEN SELECT category, name INTO l_category, l_port FROM ports_all WHERE id = NEW.id; INSERT INTO cache_clearing_ports (port_id, category, port) VALUES (NEW.id, l_category, l_port) ON CONFLICT ON CONSTRAINT cache_clearing_ports_port_id_idx DO NOTHING; NOTIFY port_updated; END IF; -- when a port changes, add an entry to the cache clearing table RETURN NEW; END $$ LANGUAGE 'plpgsql'; |
What this says:
- On the INSERT
- IF there is a conflict on the unique port_id index
- THEN do nothing
There are many other things you can do ON CONFLICT, but in my case, doing nothing is what I want.
But wait! There’s more!
I went one step farther. Why NOTIFY if we don’t have to.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | CREATE OR REPLACE FUNCTION ports_clear_cache() RETURNS TRIGGER AS $$ DECLARE l_port text; l_category text; l_row_count bigint; BEGIN IF TG_OP = 'UPDATE' THEN SELECT category, name INTO l_category, l_port FROM ports_all WHERE id = NEW.id; INSERT INTO cache_clearing_ports (port_id, category, port) VALUES (NEW.id, l_category, l_port) ON CONFLICT ON CONSTRAINT cache_clearing_ports_port_id_idx DO NOTHING; GET DIAGNOSTICS l_row_count = ROW_COUNT; IF l_row_count > 0 THEN NOTIFY port_updated; END IF; END IF; -- when a port changes, add an entry to the cache clearing table RETURN NEW; END $$ LANGUAGE 'plpgsql'; |
In this solution, the code checks to see if anything was inserted. If so, it issues the NOTIFY.