This post documents how I’m changing the search methods for pkg-message. It uses full-text search, but instead of using a special column, I’m going to use an index. I will elaborate.
When searching pkg-message was added to the website, a different approach was taken than in the past.
In Converting a column-based expression to a row based solution, you can see an index such as “generate_plist_installed_file_gin_idx” gin (to_tsvector(‘english’::regconfig, installed_file)) being used.
However, in When tsvector was working as expected on most hosts, but not one you can see that the tsvector values are being stored in separate columns (pkgmessage_textsearchable and pkgmessage_textsearchable2) using the Generated Columns feature of PostgreSQL. Then an index over that column is created. They look like this:
"ports_pkgmessage_textsearchable2_idx" gin (pkgmessage_textsearchable2) "ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable)
The columns would be used like this in a query:
SELECT DISTINCT id as port_id FROM ports WHERE pkgmessage_textsearchable @@ websearch_to_tsquery($1) OR pkgmessage_textsearchable2 @@ websearch_to_tsquery($1)
It was suggested that I could do this instead:
- drop the two auto-generated columns
- drop the two gin indexes
- create two new indexes based on the columns
The query could then be on the original columns:
SELECT DISTINCT id as port_id FROM ports WHERE pkgmessage @@ websearch_to_tsquery('english', $1) OR pkgmessage @@ websearch_to_tsquery('simple', $1)
The rest of this post documents the changes.
Database modification
begin; drop index ports_pkgmessage_textsearchable_idx; drop index ports_pkgmessage_textsearchable2_idx; alter table ports drop column pkgmessage_textsearchable; alter table ports drop column pkgmessage_textsearchable2; create index ports_pkgmessage_english_idx on ports using gin (to_tsvector('english'::regconfig, pkgmessage)); create index ports_pkgmessage_simple_idx on ports using gin (to_tsvector('simple'::regconfig, pkgmessage)); commit;
Code changes
[dev-nginx01 dan ~/www] % git diff 21:13:21 diff --git a/classes/ports_by_pkg_message.php b/classes/ports_by_pkg_message.php index ebea978d..1681c695 100644 --- a/classes/ports_by_pkg_message.php +++ b/classes/ports_by_ .php @@ -27,8 +27,8 @@ class PortsByPkgMessage extends Port { FROM ports WHERE - pkgmessage_textsearchable @@ websearch_to_tsquery($1) OR - pkgmessage_textsearchable2 @@ websearch_to_tsquery($1) + pkgmessage @@ websearch_to_tsquery('english', $1) OR + pkgmessage @@ websearch_to_tsquery('simple', $1) )'; function __construct($dbh) {
That’s it
That was tremendously simple. Let’s try the stage database next and see how that affects table size.
freshports.stagegit=# select pg_indexes_size('ports'); pg_indexes_size ----------------- 36372480 (1 row) freshports.stagegit=# select pg_table_size('ports'); pg_table_size --------------- 275243008 (1 row) freshports.stagegit=# vacuum analyse ports; VACUUM freshports.stagegit=# select pg_table_size('ports'); pg_table_size --------------- 275243008 (1 row) freshports.stagegit=# select pg_indexes_size('ports'); pg_indexes_size ----------------- 36397056 (1 row)
Let’s do a vacuum first, because we can.
Now for the changes:
freshports.stagegit=# begin; BEGIN freshports.stagegit=# drop index ports_pkgmessage_textsearchable_idx; drop index ports_pkgmessage_textsearchable2_idx; alter table ports drop column pkgmessage_textsearchable; alter table ports drop column pkgmessage_textsearchable2; create index ports_pkgmessage_english_idx on ports using gin (to_tsvector('english'::regconfig, pkgmessage)); create index ports_pkgmessage_simple_idx on ports using gin (to_tsvector('simple'::regconfig, pkgmessage)); DROP INDEX DROP INDEX ALTER TABLE ALTER TABLE CREATE INDEX CREATE INDEX freshports.stagegit=# select pg_table_size('ports'); pg_table_size --------------- 275243008 (1 row) freshports.stagegit=# select pg_indexes_size('ports'); pg_indexes_size ----------------- 33742848 (1 row) freshports.stagegit=# commit; COMMIT
Now a vacuum:
freshports.stagegit=# vacuum analyse ports; VACUUM freshports.stagegit=# select pg_indexes_size('ports'); pg_indexes_size ----------------- 36397056 (1 row) freshports.stagegit=# select pg_table_size('ports'); pg_table_size --------------- 275243008 (1 row)
The index size was unchanged. Same with the table size. So. No big gain, no big loss.
Except that now I have two fewer columns.