0

I have a table with 2,395,015 rows, where a TEXT column has one of three values and is never NULL. I have intermittent query performance issues when counting the number of rows where the value matches most (>99%) of the rows. I want to fix this performance problem. These queries must return exact counts, so I can't use approximate counts.

corpus=# \d metadata
                             Table "public.metadata"
    Column     |            Type             | Collation | Nullable |    Default
---------------+-----------------------------+-----------+----------+----------------
 id            | text                        |           | not null |
 priority      | integer                     |           | not null | 10
 media_type    | text                        |           | not null |
 modified      | timestamp without time zone |           | not null | now()
 processed     | timestamp without time zone |           |          |
 status        | text                        |           | not null | 'QUEUED'::text
 note          | text                        |           |          |
 content       | text                        |           |          |
 resolved      | text                        |           |          |
 response_time | integer                     |           |          |
 luid          | integer                     |           | not null |
 jamo_date     | timestamp without time zone |           |          |
 audit_path    | text                        |           |          |
Indexes:
    "metadata_pkey" PRIMARY KEY, btree (id)
    "metadata_id_idx" btree (id)
    "metadata_luid_idx" btree (luid)
    "metadata_modified_idx" btree (modified DESC)
    "metadata_processed_idx" btree (processed DESC)
    "metadata_status_idx" btree (status)
Check constraints:
    "media_type_ck" CHECK (media_type = ANY (ARRAY['text/json'::text, 'text/yaml'::text]))
    "status_ck" CHECK (status = ANY (ARRAY['QUEUED'::text, 'PROCESSED'::text, 'ERROR'::text]))
Foreign-key constraints:
    "metadata_luid_fkey" FOREIGN KEY (luid) REFERENCES concept(luid) ON DELETE CASCADE

corpus=#

I have some simple queries that count the number of rows matching one of the three status codes (QUEUED, PROCESSED, ERROR). There are 0 rows matching QUEUED, 9,794 matching ERROR, and 2,385,221 matching PROCESSED. When I run an identical query against each of these status codes, I usually get a result set promptly:

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='QUEUED';
                                                                          QUERY PLAN

Aggregate (cost=1947.17..1947.18 rows=1 width=8) (actual time=2.935..2.936 rows=1 loops=1) Output: count(*) -> Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..1915.97 rows=12480 width=0) (actual time=2.932..2.933 rows=0 loops=1) Output: status Index Cond: (metadata.status = 'QUEUED'::text) Heap Fetches: 0 Planning Time: 0.734 ms Execution Time: 2.988 ms (8 rows)

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='ERROR';
                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1184.19..1184.20 rows=1 width=8) (actual time=1484.763..1484.764 rows=1 loops=1)
   Output: count(*)
   ->  Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..1165.26 rows=7569 width=0) (actual time=4.235..1484.029 rows=9794 loops=1)
         Output: status
         Index Cond: (metadata.status = 'ERROR'::text)
         Heap Fetches: 9584
 Planning Time: 0.072 ms
 Execution Time: 1484.786 ms
(8 rows)

corpus=#
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
                                                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=261398.83..261398.84 rows=1 width=8) (actual time=741.319..749.026 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=261398.62..261398.83 rows=2 width=8) (actual time=741.309..749.020 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=260398.62..260398.63 rows=1 width=8) (actual time=735.099..735.100 rows=1 loops=3)
               Output: PARTIAL count(*)
               Worker 0: actual time=730.871..730.872 rows=1 loops=1
               Worker 1: actual time=733.435..733.436 rows=1 loops=1
               ->  Parallel Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..257903.37 rows=998100 width=0) (actual time=0.065..700.529 rows=795074 loops=3)
                     Output: status
                     Index Cond: (metadata.status = 'PROCESSED'::text)
                     Heap Fetches: 747048
                     Worker 0: actual time=0.060..702.980 rows=670975 loops=1
                     Worker 1: actual time=0.076..686.946 rows=1010099 loops=1
 Planning Time: 0.085 ms
 Execution Time: 749.068 ms
(18 rows)

corpus=#

But occasionally, count for the PROCESSED rows takes an excessive amount of time (sometimes several minutes):

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
                                                                                           QUERY PLAN

Finalize Aggregate (cost=261398.83..261398.84 rows=1 width=8) (actual time=30019.273..30019.336 rows=1 loops=1) Output: count() -> Gather (cost=261398.62..261398.83 rows=2 width=8) (actual time=30019.261..30019.326 rows=3 loops=1) Output: (PARTIAL count()) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=260398.62..260398.63 rows=1 width=8) (actual time=29967.734..29967.735 rows=1 loops=3) Output: PARTIAL count(*) Worker 0: actual time=29939.915..29939.916 rows=1 loops=1 Worker 1: actual time=29944.395..29944.395 rows=1 loops=1 -> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..257903.37 rows=998100 width=0) (actual time=75.385..29931.795 rows=795074 loops=3) Output: status Index Cond: (metadata.status = 'PROCESSED'::text) Heap Fetches: 747151 Worker 0: actual time=128.857..29899.156 rows=916461 loops=1 Worker 1: actual time=28.609..29905.708 rows=854439 loops=1 Planning Time: 421.203 ms Execution Time: 30019.440 ms (18 rows)

corpus=#

While the above query is running slowly, I am able to query the same table for either of the other two codes, and those queries return within 1 second. I have looked for table locks (there are none). This happens even when there are no other queries or table inserts running.

  • What are the possible causes for these intermittent slow queries?
  • What additional debugging can I try to get some more information about these slow queries?
  • Are there any relevant server settings?
  • Is there a more efficient way to index / code these columns (e.g., should I use a CHAR(1)), or even a SMALLINT? If so, what index should be used for the column?

If I use a CHAR(1), is there any difference between the following constraints:

  • ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code = ANY (ARRAY['Q'::char(1), 'P'::char(1), 'E'::char(1)]));

  • ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code IN ('Q', 'P', 'E'));

  • Could a partial index be used for this column, even though it is never NULL?

  • Should I split the PROCESSED off into a boolean column, and then use the status column only for the other codes and make it nullable with a partial index?

This is PostgreSQL 11 with default settings, running on Linux.

Other things I have tried:

  • Increased the work_mem to 100MB (via postgresql.conf). No change in performance.
  • I tried creating a partial index on the status column.

Update: I have discovered that this performance issue has nothing to do with the status column, but rather the size of the table itself, as the following 2-minute query shows:

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata;
                                                                                            QUERY PLAN

Finalize Aggregate (cost=196398.52..196398.53 rows=1 width=8) (actual time=118527.897..118554.762 rows=1 loops=1) Output: count() -> Gather (cost=196398.30..196398.51 rows=2 width=8) (actual time=118522.165..118554.756 rows=3 loops=1) Output: (PARTIAL count()) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=195398.30..195398.31 rows=1 width=8) (actual time=118491.043..118491.044 rows=1 loops=3) Output: PARTIAL count(*) Worker 0: actual time=118475.143..118475.144 rows=1 loops=1 Worker 1: actual time=118476.110..118476.111 rows=1 loops=1 -> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..192876.13rows=1008870 width=0) (actual time=71.797..118449.265 rows=809820 loops=3) Output: status Heap Fetches: 552630 Worker 0: actual time=75.877..118434.476 rows=761049 loops=1 Worker 1: actual time=104.872..118436.647 rows=745770 loops=1 Planning Time: 592.040 ms Execution Time: 118554.839 ms (17 rows)

corpus=#

This seems to be very similar to other questions now, so I am trying mitigation strategies from this answer:

  • VACUUM ANALYZE metadata; First COUNT(*) after this took 5 seconds, subsequent counts took 190ms.

Other thoughts:

  • Would it help if the status column were split off into its own table, with a foreign key into the metadata table?

Note: I am becoming convinced that this question is a duplicate of several other questions here:

This answer may hold the best solution for this problem:

As requested, here is a query plan analysis with buffers:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT COUNT(*) FROM metadata;
                                                                                       QUERY PLAN


Finalize Aggregate (cost=80771.95..80771.96 rows=1 width=8) (actual time=26711.481..26716.494 rows=1 loops=1) Output: count() Buffers: shared hit=293915 read=19595 dirtied=282 written=12 -> Gather (cost=80771.73..80771.94 rows=2 width=8) (actual time=26711.203..26716.488 rows=3 loops=1) Output: (PARTIAL count()) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=293915 read=19595 dirtied=282 written=12 -> Partial Aggregate (cost=79771.73..79771.74 rows=1 width=8) (actual time=26565.622..26565.623 rows=1 loops=3) Output: PARTIAL count(*) Buffers: shared hit=293915 read=19595 dirtied=282 written=12 Worker 0: actual time=26530.890..26530.891 rows=1 loops=1 Buffers: shared hit=105264 read=6760 dirtied=145 written=5 Worker 1: actual time=26530.942..26530.942 rows=1 loops=1 Buffers: shared hit=84675 read=7529 dirtied=46 written=2 -> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..77241.05 rows=1012275 width=0) (actual time=42.254..26529.232 rows=809820 loops=3) Output: status Heap Fetches: 17185 Buffers: shared hit=293915 read=19595 dirtied=282 written=12 Worker 0: actual time=59.291..26494.376 rows=815113 loops=1 Buffers: shared hit=105264 read=6760 dirtied=145 written=5 Worker 1: actual time=31.165..26484.729 rows=1036972 loops=1 Buffers: shared hit=84675 read=7529 dirtied=46 written=2 Planning Time: 98.400 ms Execution Time: 26716.529 ms (25 rows)

Autumn Skye
  • 379
  • 3
  • 9
  • 24

4 Answers4

1

My guess is that sometimes a lot of the heap pages are buffered, and the queries run quickly. Other times the buffer is loaded with other pages, and so you end up waiting to read the data from the disk.

There are 2 problems with this kind of querying:

  1. Even though the query is covered by the index, it has to fetch heap tuples because of MVCC - it doesn't know which of the index records are visible for the current tx. This could be improved if a lot of pages end up in the Visibility Map. But if you constantly update records on old pages, it won't hold for long.
  2. Regardless of the optimizations, this is just bad querying. The cardinality of your column is very low. If the query needs to inspect almost all the data, then it won't be fast on big tables. Maybe you should keep track of the counts in a separate table and update those counts when the rows in the main table is modified.

Postgres keeps statistics for the columns, including the histograms of how many rows of each value to expect. If approximate statistics is enough for you, you can just query pg_stats. Another approach could be to use sampling. It's also an approximation. But if you need exact numbers, then you should consider doing something more sophisticated and keeping the statistics in a separate table yourself.

Rework architecture to store pre-calculated stats

There's a limit to how much this architecture can scale - and it's not much. Every time you run this query you'll have to do a sequential scan of the table (the indices are useless).

If this is a critical query, you can rework your architecture to keep the summary stats in a separate table:

  1. Add column counted default false,
  2. In a background job once every N minutes save counts to some other table:
with counted as(
  update record_status set counted=true 
  where not counted 
  returning 1
)
update stats set counts=counts+(select count(*) from counted)

Then when you need to do the counting what's left is to sum up the stats and the new counts that haven't been updated by the background job:

select (select counts from stats)
       +
       (select count(*) from record_status where not counted)

The counted needs to be indexed (a partial index would be perfect). This will give results instantly.

If you need to support the deletion too, then instead of counted with just 2 values, you can turn it into 3 values: JUST_INSERTED, JUST_DELETED, COUNTED. Then instead of deleting the record right away - mark it as JUST_DELETED, in order to update the stats in the same background job. But this time subtract the counts.

0

Based on my testing described in my revised question and advice from answers to similar questions (see here and here), I have implemented the following changes:

  • Created a partial index on the status column. CREATE INDEX status_not_processed_idx ON metadata (status) WHERE status<>'PROCESSED';
  • Vacuumed table. VACUUM ANALYZE metadata;
  • Adjusted auto-vacuum settings on table. ALTER TABLE metadata SET (autovacuum_vacuum_scale_factor = 0, autovacuum_analyze_scale_factor = 0, autovacuum_vacuum_threshold = 10000, autovacuum_analyze_threshold = 10000);
  • Created a fast_row_count(table_name) function based on the method described here:
CREATE OR REPLACE FUNCTION fast_count_rows(table_name text)
RETURNS bigint
AS $$
SELECT
(
  CASE WHEN c.reltuples < 0 THEN NULL
       WHEN c.relpages = 0 THEN float8 '0'
       ELSE c.reltuples / c.relpages END * (pg_catalog.pg_relation_size(c.oid) / pg_catalog.current_setting('block_size')::int)
)::bigint
  FROM pg_catalog.pg_class c
  WHERE c.oid = CONCAT('public.',table_name)::regclass
$$ LANGUAGE SQL STABLE;
corpus=> SELECT fast_count_rows('metadata');
 fast_count_rows
-----------------
         2192335
  • Avoid using COUNT(*) or counts on PROCESSED value:
CREATE OR REPLACE VIEW metadata_queue_statistics AS
SELECT
  COUNT(*) FILTER (WHERE status = 'QUEUED'::text) AS queued,
  fast_count_rows('metadata') - COUNT(*) FILTER (WHERE status = ANY (ARRAY ['QUEUED'::text, 'ERROR'::text])) AS completed,
  COUNT(*) FILTER (WHERE status = 'ERROR'::text) AS failed,
  fast_count_rows('metadata') AS total
FROM metadata
;
corpus=> SELECT * FROM metadata_queue_statistics;
queued | failed | completed |  total
--------+--------+-----------+---------
  19258 |  11372 |   2398829 | 2429459
(1 row)

The initial performance test looked promising (422ms) but subsequent testing of this approach yielded the same performance issue:

corpus=> EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM metadata_queue_statistics;
                    QUERY PLAN


Aggregate (cost=127854.78..127855.29 rows=1 width=32) (actual time=76853.872..76853.872 rows=1 loops=1) Output: count() FILTER (WHERE (metadata.status = 'QUEUED'::text)), (fast_count_rows('metadata'::text) - count() FILTER (WHERE (metadata.status = ANY ('{QUEUED,ERROR}'::text[])))), count(*) FILTER (WHERE (metadata.status = 'ERROR'::text)), fast_count_rows('metadata'::text) Buffers: shared hit=295240 read=26638 dirtied=383 written=18 -> Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..91412.89 rows=2429459 width=9) (actual time=0.071..76432.532 rows=2429459 loops=1) Output: metadata.status Heap Fetches: 28878 Buffers: shared hit=295234 read=26638 dirtied=383 written=18 Planning Time: 124.091 ms Execution Time: 76853.911 ms (9 rows)

I will not to mark this as the answer for my question because the performance problem persisted. I'm leaving this here as an example of a failed solution.

Also attempted:

  • Increased shared_buffers in postgresql.conf from default 128MB to 2GB. (no significant change in query performance)
Autumn Skye
  • 379
  • 3
  • 9
  • 24
0

This problem finally got to the point where I couldn't ignore it any longer.

I finally solved this performance problem by moving the status code into it's own table and using an integer luid (auto-incrementing locally unique identifier) from the main table as the key. The status code itself is now only a single character. The database is up to 4,736,786 records and the view (also modified, pasted below) completes the query using the new table in less than one second.

ALTER TABLE jgi_metadata ADD CONSTRAINT unique_luid UNIQUE (luid);

DROP TABLE record_status; -- earlier attempt

CREATE TABLE record_status ( luid integer NOT NULL REFERENCES jgi_metadata(luid), status_code char(1) NOT NULL DEFAULT 'Q', status text, PRIMARY KEY (luid) ); ALTER TABLE record_status ADD CONSTRAINT status_ck CHECK (status_code = ANY (ARRAY['Q'::char(1), 'P'::char(1), 'E'::char(1)]));

INSERT INTO record_status (luid,status) SELECT luid,status FROM jgi_metadata;

UPDATE record_status SET status_code='Q' WHERE status='QUEUED'; UPDATE record_status SET status_code='P' WHERE status='PROCESSED'; UPDATE record_status SET status_code='E' WHERE status='ERROR';

ALTER TABLE record_status DROP COLUMN status;

CREATE INDEX ON record_status(luid); CREATE INDEX ON record_status(status_code);

ALTER TABLE jgi_metadata DROP COLUMN status;

Table "public.record_status"

   Column    |     Type     | Collation | Nullable |   Default   | Storage  | Stats target | Description
-------------+--------------+-----------+----------+-------------+----------+--------------+-------------
 luid        | integer      |           | not null |             | plain    |              |
 status_code | character(1) |           | not null | 'Q'::bpchar | extended |              |
Indexes:
    "record_status_pkey" PRIMARY KEY, btree (luid)
    "record_status_luid_idx" btree (luid)
    "record_status_status_code_idx" btree (status_code)
Check constraints:
    "status_ck" CHECK (status_code = ANY (ARRAY['Q'::character(1), 'P'::character(1), 'E'::character(1)]))
Foreign-key constraints:
    "record_status_luid_fkey" FOREIGN KEY (luid) REFERENCES metadata(luid)

I also simplified the statistics view by removing the fast_count_rows() function (which used bigint), which cut another 500ms from the query:

View "public.metadata_queue_statistics"

CREATE OR REPLACE VIEW metadata_queue_statistics AS
 SELECT count(*) FILTER (WHERE record_status.status_code = 'Q'::char(1)) AS queued,
        count(*) FILTER (WHERE record_status.status_code = 'P'::char(1)) AS completed,
        count(*) FILTER (WHERE record_status.status_code = 'E'::char(1)) AS failed,
        count(*) AS total
  FROM record_status
;

Performance before removing fast_count_rows function from view:

corpus=# EXPLAIN ANALYZE SELECT * FROM metadata_queue_statistics;
                                                          QUERY PLAN

Aggregate (cost=170413.05..170413.56 rows=1 width=32) (actual time=828.100..828.101 rows=1 loops=1) -> Seq Scan on record_status (cost=0.00..98880.42 rows=4768842 width=2) (actual time=0.009..337.786 rows=4736786 loops=1) Planning Time: 0.227 ms Execution Time: 828.123 ms (4 rows)

Performance after removing fast_count_rows function from view:

corpus=# EXPLAIN ANALYZE SELECT * FROM metadata_queue_statistics;
                                                          QUERY PLAN

Finalize Aggregate (cost=106835.22..106835.23 rows=1 width=32) (actual time=360.787..379.916 rows=1 loops=1) -> Gather (cost=106834.99..106835.20 rows=2 width=32) (actual time=360.502..379.910 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=105834.99..105835.00 rows=1 width=32) (actual time=349.462..349.462 rows=1 loops=3) -> Parallel Seq Scan on record_status (cost=0.00..71062.18 rows=1987018 width=2) (actual time=0.181..164.633 rows=1578929 loops=3) Planning Time: 0.083 ms Execution Time: 379.953 ms (8 rows)

So, now I have this view completing in less than half a second.

For comparison with the buffer sizes from my earlier attempts at solving this problem, here is the more detailed execution plan:

corpus=# EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM metadata_queue_statistics;
             QUERY PLAN


Finalize Aggregate (cost=106835.22..106835.23 rows=1 width=32) (actual time=402.090..406.696 rows=1 loops=1) Output: count() FILTER (WHERE (record_status.status_code = 'Q'::character(1))), count() FILTER (WHERE (record_status.status_code = 'P'::character(1))), count() FILTER (WHERE (record_status.status_code = 'E'::character(1))), count() Buffers: shared hit=51192 -> Gather (cost=106834.99..106835.20 rows=2 width=32) (actual time=382.500..406.686 rows=3 loops=1) Output: (PARTIAL count() FILTER (WHERE (record_status.status_code = 'Q'::character(1)))), (PARTIAL count() FILTER (WHERE (record_status.status_code = 'P'::character(1)))), (PARTIAL count() FILTER (WHERE (record_status.status_code = 'E'::character(1)))), (PARTIAL count()) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=51192 -> Partial Aggregate (cost=105834.99..105835.00 rows=1 width=32) (actual time=378.003..378.003 rows=1 loops=3) Output: PARTIAL count() FILTER (WHERE (record_status.status_code = 'Q'::character(1))), PARTIAL count() FILTER (WHERE (record_status.status_code = 'P'::character(1))), PARTIAL count() FILTER (WHERE (record_status.status_code = 'E'::character(1))), PARTIAL count() Buffers: shared hit=51192 Worker 0: actual time=366.127..366.128 rows=1 loops=1 Buffers: shared hit=6728 Worker 1: actual time=385.566..385.566 rows=1 loops=1 Buffers: shared hit=7750 -> Parallel Seq Scan on public.record_status (cost=0.00..71062.18 rows=1987018 width=2) (actual time=1.255..188.238 rows=1578929 loops=3) Output: record_status.status_code Buffers: shared hit=51192 Worker 0: actual time=3.727..181.228 rows=1111377 loops=1 Buffers: shared hit=6728 Worker 1: actual time=0.030..211.375 rows=1073500 loops=1 Buffers: shared hit=7750 Planning Time: 0.079 ms Execution Time: 406.726 ms (24 rows)

Autumn Skye
  • 379
  • 3
  • 9
  • 24
0

I don't know how this went, but here is what I need to know?

  • What is your PostgreSQL version and the configuration file? Your hardware, the type of disk of your data/index volume or partition

There are so many variables so I would sum up into two issues,

(1) As @Stanislav said above, the cardinality of status index is too low to be benefit from heap-based index.

(2) The table has too many TEXT field and it could bump up of row on each page just to provide you an exact number of row, and because of the TEXT column in the middle of the full row, its performance would be bad, explaining your ANALYZE make it good at first, then it dump all out of the shared_buffers.

--> For the second issue, here is what I did, create two tables, one to store its critical result such as media_type, status, thing .. that technically don't have TEXT field in it. And the other has UNIQUE FOREIGN KEY with PRIMARY KEY from the first table.

--> For the INSERT and UPDATE and DELETE, in your app, making an explicit transaction include two INSERT (or UPDATE/DELETE) in it. If both are OK, apply COMMIT and/or ROLLBACK if you don't like it. Then if you want to extract it content, use LEFT JOIN (not INNER JOIN). In here, the main (first) table is much smaller, allowing you to load and get the COUNT on the same amount of rows in less number of pages required to scan.