Moving from storing the tsvector values to a tsvector index

This post documents how I’m changing the search methods for pkg-message. It uses full-text search, but instead of using a special column, I’m going to use an index. I will elaborate.

When searching pkg-message was added to the website, a different approach was taken than in the past.
In Converting a column-based expression to a row based solution, you can see an index such as “generate_plist_installed_file_gin_idx” gin (to_tsvector(‘english’::regconfig, installed_file)) being used.

However, in When tsvector was working as expected on most hosts, but not one you can see that the tsvector values are being stored in separate columns (pkgmessage_textsearchable and pkgmessage_textsearchable2) using the Generated Columns feature of PostgreSQL. Then an index over that column is created. They look like this:

    "ports_pkgmessage_textsearchable2_idx" gin (pkgmessage_textsearchable2)
    "ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable)

The columns would be used like this in a query:

SELECT DISTINCT id as port_id
  FROM ports
 WHERE pkgmessage_textsearchable  @@ websearch_to_tsquery($1) OR
       pkgmessage_textsearchable2 @@ websearch_to_tsquery($1)

It was suggested that I could do this instead:

  1. drop the two auto-generated columns
  2. drop the two gin indexes
  3. create two new indexes based on the columns

The query could then be on the original columns:

SELECT DISTINCT id as port_id
  FROM ports
 WHERE pkgmessage @@ websearch_to_tsquery('english', $1) OR
       pkgmessage @@ websearch_to_tsquery('simple', $1)

The rest of this post documents the changes.

Database modification

begin;
drop index ports_pkgmessage_textsearchable_idx;
drop index ports_pkgmessage_textsearchable2_idx;
alter table ports drop column pkgmessage_textsearchable;
alter table ports drop column pkgmessage_textsearchable2;
create index ports_pkgmessage_english_idx on ports using gin (to_tsvector('english'::regconfig, pkgmessage));
create index ports_pkgmessage_simple_idx  on ports using gin (to_tsvector('simple'::regconfig, pkgmessage));
commit;

Code changes

[dev-nginx01 dan ~/www] % git diff                                                                                                                        21:13:21
diff --git a/classes/ports_by_pkg_message.php b/classes/ports_by_pkg_message.php
index ebea978d..1681c695 100644
--- a/classes/ports_by_pkg_message.php
+++ b/classes/ports_by_ .php
@@ -27,8 +27,8 @@ class PortsByPkgMessage extends Port {
     FROM
         ports
     WHERE
-        pkgmessage_textsearchable  @@ websearch_to_tsquery($1) OR
-        pkgmessage_textsearchable2 @@ websearch_to_tsquery($1)
+        pkgmessage @@ websearch_to_tsquery('english', $1) OR
+        pkgmessage @@ websearch_to_tsquery('simple',  $1)
 )';
 
        function __construct($dbh) {

That’s it

That was tremendously simple. Let’s try the stage database next and see how that affects table size.

freshports.stagegit=# select pg_indexes_size('ports');
 pg_indexes_size 
-----------------
        36372480
(1 row)

freshports.stagegit=# select pg_table_size('ports');
 pg_table_size 
---------------
     275243008
(1 row)

freshports.stagegit=# vacuum analyse ports;
VACUUM
freshports.stagegit=# select pg_table_size('ports');
 pg_table_size 
---------------
     275243008
(1 row)

freshports.stagegit=# select pg_indexes_size('ports');
 pg_indexes_size 
-----------------
        36397056
(1 row)

Let’s do a vacuum first, because we can.

Now for the changes:

freshports.stagegit=# begin;
BEGIN
freshports.stagegit=# drop index ports_pkgmessage_textsearchable_idx;
drop index ports_pkgmessage_textsearchable2_idx;
alter table ports drop column pkgmessage_textsearchable;
alter table ports drop column pkgmessage_textsearchable2;
create index ports_pkgmessage_english_idx on ports using gin (to_tsvector('english'::regconfig, pkgmessage));
create index ports_pkgmessage_simple_idx  on ports using gin (to_tsvector('simple'::regconfig, pkgmessage));
DROP INDEX
DROP INDEX
ALTER TABLE
ALTER TABLE
CREATE INDEX
CREATE INDEX
freshports.stagegit=# select pg_table_size('ports');
 pg_table_size 
---------------
     275243008
(1 row)

freshports.stagegit=# select pg_indexes_size('ports');
 pg_indexes_size 
-----------------
        33742848
(1 row)
freshports.stagegit=# commit;
COMMIT

Now a vacuum:

freshports.stagegit=# vacuum analyse ports;
VACUUM
freshports.stagegit=# select pg_indexes_size('ports');
 pg_indexes_size 
-----------------
        36397056
(1 row)

freshports.stagegit=# select pg_table_size('ports');
 pg_table_size 
---------------
     275243008
(1 row)

The index size was unchanged. Same with the table size. So. No big gain, no big loss.

Except that now I have two fewer columns.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top