In yesterdays blog post, Replacing a column search with a full text search solution, I wanted to mention how adding a LIMIT 10 to a fast running query made it run slower. I didn’t remember that issue until this morning. Fortunately, the fix was low cost.
The query
This is the query I ran:
EXPLAIN ANALYSE WITH short_list AS ( SELECT port_id FROM generate_plist WHERE textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh') ) select CL.commit_date - SystemTimeAdjust() AS last_commit_date, P.id, E.name as port, C.name as category, ... P.uses FROM short_list, ports P LEFT OUTER JOIN ports_vulnerable PV ON PV.port_id = P.id LEFT OUTER JOIN commit_log CL ON P.last_commit_id = CL.id, element_pathname EP, categories C, element E WHERE P.id = short_list.port_id AND P.element_id = EP.element_id and EP.pathname like '/ports/head/%' AND P.category_id = C.id AND P.element_id = E.id AND E.status = 'A';
The plan (as found at https://explain.depesz.com/s/Bkkg) was fast enough for me.
Adding LIMIT 10
This was used on the FreshPorts search page (not in production, but on the dev website). It needs a limit, so I added LIMIT 10.
The plan changed to a sequential scan of a large table. Boom, 28 seconds. That is unsatisfactory. The plan is at https://explain.depesz.com/s/oJZB.
The Fix
The fix, as pointed out by Thomas Hurst in his reply to my tweet centered around a CTE (Common Table Expressions). That initially confused me, because the query was already using a CTE. That’s the WITH short_list AS part of the query.
Then I read the AS MATERIALIZED part of his reply and started reading up on that. I changed the query to:
EXPLAIN ANALYSE WITH short_list AS MATERIALIZED ( SELECT port_id FROM generate_plist WHERE textsearchable_index_col @@ to_tsquery('share/examples/acme.sh/dnsapi/dns_1984hosting.sh') ) select CL.commit_date - SystemTimeAdjust() AS last_commit_date, P.id, E.name as port, C.name as category, ... P.uses FROM short_list, ports P LEFT OUTER JOIN ports_vulnerable PV ON PV.port_id = P.id LEFT OUTER JOIN commit_log CL ON P.last_commit_id = CL.id, element_pathname EP, categories C, element E WHERE P.id = short_list.port_id AND P.element_id = EP.element_id and EP.pathname like '/ports/head/%' AND P.category_id = C.id AND P.element_id = E.id AND E.status = 'A' LIMIT 10;
You can see the change on line 2: AS MATERIALIZED.
The plan (as found at https://explain.depesz.com/s/UBps) takes 1.4s and is back to using an index scan.
Further reading:
- Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior
- A Stack Exchange discussion