In recent posts, I’ve spoken about a particular task taking about an hour to process. I think I have found one small improvement. A query has gone from 50ms to about 5ms with the addition of an index.
The query is:
   SELECT count(vuxml_id)
          FROM commit_log_ports_vuxml CLPV, commit_log_ports CLP, ports P
         WHERE CLP.commit_log_id = CLPV.commit_log_id
           AND CLPV.port_id      = CLP.port_id
           AND P.id              = CLP.port_id
           AND P.version         = CLP.port_version
           AND P.revision        = CLP.port_revision
           AND CLP.port_epoch    = P.portepoch
           AND P.id              = 16419
         GROUP BY P.id;
The output from explain analyze is:
 GroupAggregate  (cost=0.00..3171.43 rows=1 width=8) (actual time=59.342..59.343 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..3171.41 rows=1 width=8) (actual time=52.756..59.322 rows=1 loops=1)
         Join Filter: (clp.commit_log_id = clpv.commit_log_id)
         ->  Nested Loop  (cost=0.00..76.31 rows=1 width=12) (actual time=0.048..0.425 rows=1 loops=1)
               Join Filter: ((clp.port_version = p.version) AND (clp.port_revision = p.revision) AND (clp.port_epoch = p.portepoch))
               ->  Index Scan using ports_pkey on ports p  (cost=0.00..8.27 rows=1 width=13) (actual time=0.020..0.022 rows=1 loops=1)
                     Index Cond: (id = 16419)
               ->  Index Scan using commit_log_ports_port_id on commit_log_ports clp  (cost=0.00..67.65 rows=22 width=18) (actual time=0.016..0.255 rows=102 loops=1)
                     Index Cond: (clp.port_id = 16419)
         ->  Seq Scan on commit_log_ports_vuxml clpv  (cost=0.00..3074.39 rows=1657 width=12) (actual time=49.140..56.781 rows=1657 loops=1)
               Filter: (clpv.port_id = 16419)
 Total runtime: 59.458 ms
What stands out? That sequential scan.
What does the table look like?
                             Table "public.commit_log_ports_vuxml"
    Column     |  Type   |                              Modifiers
---------------+---------+---------------------------------------------------------------------
 id            | integer | not null default nextval('commit_log_ports_vuxml_id_seq'::regclass)
 commit_log_id | integer | not null
 port_id       | integer | not null
 vuxml_id      | integer | not null
Indexes:
    "commit_log_ports_vuxml_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "$1" FOREIGN KEY (vuxml_id) REFERENCES vuxml(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$3" FOREIGN KEY (commit_log_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE CASCADE
Ahh, no index on port_id. Let’s fix that.
freshports.org=# create index commit_log_ports_vuxml_port_id on commit_log_ports_vuxml(port_id); CREATE INDEX
Now what’s the query plan?
freshports.org=# explain analyse
freshports.org-#    SELECT count(vuxml_id)
freshports.org-#           FROM commit_log_ports_vuxml CLPV, commit_log_ports CLP, ports P
freshports.org-#          WHERE CLP.commit_log_id = CLPV.commit_log_id
freshports.org-#            AND CLPV.port_id      = CLP.port_id
freshports.org-#            AND P.id              = CLP.port_id
freshports.org-#            AND P.version         = CLP.port_version
freshports.org-#            AND P.revision        = CLP.port_revision
freshports.org-#            AND CLP.port_epoch    = P.portepoch
freshports.org-#            AND P.id              = 16419
freshports.org-#          GROUP BY P.id;
                                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=29.10..1643.72 rows=1 width=8) (actual time=4.788..4.788 rows=1 loops=1)
   ->  Nested Loop  (cost=29.10..1643.70 rows=1 width=8) (actual time=3.237..4.778 rows=1 loops=1)
         Join Filter: (clp.commit_log_id = clpv.commit_log_id)
         ->  Nested Loop  (cost=0.00..76.31 rows=1 width=12) (actual time=0.049..0.428 rows=1 loops=1)
               Join Filter: ((clp.port_version = p.version) AND (clp.port_revision = p.revision) AND (clp.port_epoch = p.portepoch))
               ->  Index Scan using ports_pkey on ports p  (cost=0.00..8.27 rows=1 width=13) (actual time=0.019..0.021 rows=1 loops=1)
                     Index Cond: (id = 16419)
               ->  Index Scan using commit_log_ports_port_id on commit_log_ports clp  (cost=0.00..67.65 rows=22 width=18) (actual time=0.017..0.241 rows=102 loops
                     Index Cond: (clp.port_id = 16419)
         ->  Bitmap Heap Scan on commit_log_ports_vuxml clpv  (cost=29.10..1546.68 rows=1657 width=12) (actual time=0.313..2.269 rows=1657 loops=1)
               Recheck Cond: (clpv.port_id = 16419)
               ->  Bitmap Index Scan on commit_log_ports_vuxml_port_id  (cost=0.00..28.69 rows=1657 width=0) (actual time=0.298..0.298 rows=1657 loops=1)
                     Index Cond: (clpv.port_id = 16419)
 Total runtime: 4.899 ms
OK. Sure, a savings of 40ms isn’t going to help me much. But I estimate this query is run about 1100 times. That saves me 44 second.
Let’s see if there are some other cost savings around…











