This post documents how I’m changing the search methods for pkg-message. It uses full-text search, but instead of using a special column, I’m going to use an index. I will elaborate.
When searching pkg-message was added to the website, a different approach was taken than in the past.
In Converting a column-based expression to a row based solution, you can see an index such as “generate_plist_installed_file_gin_idx” gin (to_tsvector(‘english’::regconfig, installed_file)) being used.
However, in When tsvector was working as expected on most hosts, but not one you can see that the tsvector values are being stored in separate columns (pkgmessage_textsearchable and pkgmessage_textsearchable2) using the Generated Columns feature of PostgreSQL. Then an index over that column is created. They look like this:
"ports_pkgmessage_textsearchable2_idx" gin (pkgmessage_textsearchable2)
"ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable)
The columns would be used like this in a query:
SELECT DISTINCT id as port_id
FROM ports
WHERE pkgmessage_textsearchable @@ websearch_to_tsquery($1) OR
pkgmessage_textsearchable2 @@ websearch_to_tsquery($1)
It was suggested that I could do this instead:
- drop the two auto-generated columns
- drop the two gin indexes
- create two new indexes based on the columns
The query could then be on the original columns:
SELECT DISTINCT id as port_id
FROM ports
WHERE pkgmessage @@ websearch_to_tsquery('english', $1) OR
pkgmessage @@ websearch_to_tsquery('simple', $1)
The rest of this post documents the changes.
Database modification
begin;
drop index ports_pkgmessage_textsearchable_idx;
drop index ports_pkgmessage_textsearchable2_idx;
alter table ports drop column pkgmessage_textsearchable;
alter table ports drop column pkgmessage_textsearchable2;
create index ports_pkgmessage_english_idx on ports using gin (to_tsvector('english'::regconfig, pkgmessage));
create index ports_pkgmessage_simple_idx on ports using gin (to_tsvector('simple'::regconfig, pkgmessage));
commit;
Code changes
[dev-nginx01 dan ~/www] % git diff 21:13:21
diff --git a/classes/ports_by_pkg_message.php b/classes/ports_by_pkg_message.php
index ebea978d..1681c695 100644
--- a/classes/ports_by_pkg_message.php
+++ b/classes/ports_by_ .php
@@ -27,8 +27,8 @@ class PortsByPkgMessage extends Port {
FROM
ports
WHERE
- pkgmessage_textsearchable @@ websearch_to_tsquery($1) OR
- pkgmessage_textsearchable2 @@ websearch_to_tsquery($1)
+ pkgmessage @@ websearch_to_tsquery('english', $1) OR
+ pkgmessage @@ websearch_to_tsquery('simple', $1)
)';
function __construct($dbh) {
That’s it
That was tremendously simple. Let’s try the stage database next and see how that affects table size.
freshports.stagegit=# select pg_indexes_size('ports');
pg_indexes_size
-----------------
36372480
(1 row)
freshports.stagegit=# select pg_table_size('ports');
pg_table_size
---------------
275243008
(1 row)
freshports.stagegit=# vacuum analyse ports;
VACUUM
freshports.stagegit=# select pg_table_size('ports');
pg_table_size
---------------
275243008
(1 row)
freshports.stagegit=# select pg_indexes_size('ports');
pg_indexes_size
-----------------
36397056
(1 row)
Let’s do a vacuum first, because we can.
Now for the changes:
freshports.stagegit=# begin;
BEGIN
freshports.stagegit=# drop index ports_pkgmessage_textsearchable_idx;
drop index ports_pkgmessage_textsearchable2_idx;
alter table ports drop column pkgmessage_textsearchable;
alter table ports drop column pkgmessage_textsearchable2;
create index ports_pkgmessage_english_idx on ports using gin (to_tsvector('english'::regconfig, pkgmessage));
create index ports_pkgmessage_simple_idx on ports using gin (to_tsvector('simple'::regconfig, pkgmessage));
DROP INDEX
DROP INDEX
ALTER TABLE
ALTER TABLE
CREATE INDEX
CREATE INDEX
freshports.stagegit=# select pg_table_size('ports');
pg_table_size
---------------
275243008
(1 row)
freshports.stagegit=# select pg_indexes_size('ports');
pg_indexes_size
-----------------
33742848
(1 row)
freshports.stagegit=# commit;
COMMIT
Now a vacuum:
freshports.stagegit=# vacuum analyse ports;
VACUUM
freshports.stagegit=# select pg_indexes_size('ports');
pg_indexes_size
-----------------
36397056
(1 row)
freshports.stagegit=# select pg_table_size('ports');
pg_table_size
---------------
275243008
(1 row)
The index size was unchanged. Same with the table size. So. No big gain, no big loss.
Except that now I have two fewer columns.











