Jun 222012
 

In recent posts, I’ve spoken about a particular task taking about an hour to process. I think I have found one small improvement. A query has gone from 50ms to about 5ms with the addition of an index.

The query is:

   SELECT count(vuxml_id)
          FROM commit_log_ports_vuxml CLPV, commit_log_ports CLP, ports P
         WHERE CLP.commit_log_id = CLPV.commit_log_id
           AND CLPV.port_id      = CLP.port_id
           AND P.id              = CLP.port_id
           AND P.version         = CLP.port_version
           AND P.revision        = CLP.port_revision
           AND CLP.port_epoch    = P.portepoch
           AND P.id              = 16419
         GROUP BY P.id;

The output from explain analyze is:


 GroupAggregate  (cost=0.00..3171.43 rows=1 width=8) (actual time=59.342..59.343 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..3171.41 rows=1 width=8) (actual time=52.756..59.322 rows=1 loops=1)
         Join Filter: (clp.commit_log_id = clpv.commit_log_id)
         ->  Nested Loop  (cost=0.00..76.31 rows=1 width=12) (actual time=0.048..0.425 rows=1 loops=1)
               Join Filter: ((clp.port_version = p.version) AND (clp.port_revision = p.revision) AND (clp.port_epoch = p.portepoch))
               ->  Index Scan using ports_pkey on ports p  (cost=0.00..8.27 rows=1 width=13) (actual time=0.020..0.022 rows=1 loops=1)
                     Index Cond: (id = 16419)
               ->  Index Scan using commit_log_ports_port_id on commit_log_ports clp  (cost=0.00..67.65 rows=22 width=18) (actual time=0.016..0.255 rows=102 loops=1)
                     Index Cond: (clp.port_id = 16419)
         ->  Seq Scan on commit_log_ports_vuxml clpv  (cost=0.00..3074.39 rows=1657 width=12) (actual time=49.140..56.781 rows=1657 loops=1)
               Filter: (clpv.port_id = 16419)
 Total runtime: 59.458 ms

What stands out? That sequential scan.

What does the table look like?


                             Table "public.commit_log_ports_vuxml"
    Column     |  Type   |                              Modifiers
---------------+---------+---------------------------------------------------------------------
 id            | integer | not null default nextval('commit_log_ports_vuxml_id_seq'::regclass)
 commit_log_id | integer | not null
 port_id       | integer | not null
 vuxml_id      | integer | not null
Indexes:
    "commit_log_ports_vuxml_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "$1" FOREIGN KEY (vuxml_id) REFERENCES vuxml(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE
    "$3" FOREIGN KEY (commit_log_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE CASCADE

Ahh, no index on port_id. Let’s fix that.

freshports.org=# create index commit_log_ports_vuxml_port_id on commit_log_ports_vuxml(port_id);
CREATE INDEX

Now what’s the query plan?


freshports.org=# explain analyse
freshports.org-#    SELECT count(vuxml_id)
freshports.org-#           FROM commit_log_ports_vuxml CLPV, commit_log_ports CLP, ports P
freshports.org-#          WHERE CLP.commit_log_id = CLPV.commit_log_id
freshports.org-#            AND CLPV.port_id      = CLP.port_id
freshports.org-#            AND P.id              = CLP.port_id
freshports.org-#            AND P.version         = CLP.port_version
freshports.org-#            AND P.revision        = CLP.port_revision
freshports.org-#            AND CLP.port_epoch    = P.portepoch
freshports.org-#            AND P.id              = 16419
freshports.org-#          GROUP BY P.id;
                                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=29.10..1643.72 rows=1 width=8) (actual time=4.788..4.788 rows=1 loops=1)
   ->  Nested Loop  (cost=29.10..1643.70 rows=1 width=8) (actual time=3.237..4.778 rows=1 loops=1)
         Join Filter: (clp.commit_log_id = clpv.commit_log_id)
         ->  Nested Loop  (cost=0.00..76.31 rows=1 width=12) (actual time=0.049..0.428 rows=1 loops=1)
               Join Filter: ((clp.port_version = p.version) AND (clp.port_revision = p.revision) AND (clp.port_epoch = p.portepoch))
               ->  Index Scan using ports_pkey on ports p  (cost=0.00..8.27 rows=1 width=13) (actual time=0.019..0.021 rows=1 loops=1)
                     Index Cond: (id = 16419)
               ->  Index Scan using commit_log_ports_port_id on commit_log_ports clp  (cost=0.00..67.65 rows=22 width=18) (actual time=0.017..0.241 rows=102 loops
                     Index Cond: (clp.port_id = 16419)
         ->  Bitmap Heap Scan on commit_log_ports_vuxml clpv  (cost=29.10..1546.68 rows=1657 width=12) (actual time=0.313..2.269 rows=1657 loops=1)
               Recheck Cond: (clpv.port_id = 16419)
               ->  Bitmap Index Scan on commit_log_ports_vuxml_port_id  (cost=0.00..28.69 rows=1657 width=0) (actual time=0.298..0.298 rows=1657 loops=1)
                     Index Cond: (clpv.port_id = 16419)
 Total runtime: 4.899 ms

OK. Sure, a savings of 40ms isn’t going to help me much. But I estimate this query is run about 1100 times. That saves me 44 second.

Let’s see if there are some other cost savings around…

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive