Optimal way to get a total count of rows in a paged query in Postgres?

Question

I need to improve the performance of a paged query for customer orders in a Type2 Postgres db (always insert a new record with a new ID, the newest ID is the current version of the record). Changing away from Type2 is not an option at this time. The query I have is two queries with the same CTE in both:

WITH customer_orders AS (
    select id, order_id, customer_id,
    "name", country, state, county, source_system, 
    is_deleted, created_at, updated_at, deleted_at,
    created_by, updated_by, deleted_by, 
    rank() over (partition by order_id order by id desc) as entity_rank 
    from orders WHERE customer_id = $1 and is_deleted= $2
  )
SELECT * FROM customer_orders where entity_rank = 1 ORDER BY id DESC LIMIT $3 OFFSET $4;
WITH customer_orders AS (
    select id, order_id, customer_id,
    "name", country, state, county, source_system, 
    is_deleted, created_at, updated_at, deleted_at,
    created_by, updated_by, deleted_by, 
    rank() over (partition by order_id order by id desc) as entity_rank 
    from orders WHERE customer_id = $1 and is_deleted= $2
    )
SELECT count(id) FROM customer_orders where entity_rank = 1;

But I wonder if there's a better way to do this, can I select from the CTE twice, once for the paging (limit + offset) and once for the total number of records? I'll be running this as two separate queries from a Node process. It seems like it should be doable in one query but I can't get it.

Indexes: id (PK), order_id, customer_id, is_deleted (1 on each of those)

Erwin Brandstetter · Accepted Answer · 2023-08-29T20:07:03.397

Yes, you can reuse the same CTE multiple times, and that's typically faster than running multiple queries. (But you can only return from a query once.)

But you don't even have to for the use case. Just add a window function in the outer SELECT:

WITH customer_orders AS (
   SELECT id, order_id, customer_id
        , "name", country, state, county, source_system
        , is_deleted, created_at, updated_at, deleted_at
        , created_by, updated_by, deleted_by
        , row_number() OVER (PARTITION BY order_id ORDER BY id DESC ROWS UNBOUNDED PRECEDING) AS entity_rank
   FROM   orders
   WHERE  customer_id = $1
   AND    is_deleted = $2
   )
SELECT *, count(*) OVER () AS total_count
FROM   customer_orders
WHERE  entity_rank = 1
ORDER  BY id DESC
LIMIT  $3
OFFSET $4;

See:

Run a query with a LIMIT/OFFSET and also get the total number of rows

You don't even need a CTE at all now. A plain subquery does the job. (But doesn't make much of a difference any more since Postgres 12.)

SELECT *, count(*) OVER () AS total_count
FROM  (
   SELECT id, order_id, customer_id
        , "name", country, state, county, source_system
        , is_deleted, created_at, updated_at, deleted_at
        , created_by, updated_by, deleted_by
        , row_number() OVER (PARTITION BY order_id ORDER BY id DESC ROWS UNBOUNDED PRECEDING) AS entity_rank
   FROM   orders
   WHERE  customer_id = $1
   AND    is_deleted = $2
   ) customer_orders
WHERE  entity_rank = 1
ORDER  BY id DESC
LIMIT  $3
OFFSET $4;

While being at it, I replaced rank() with the more appropriate (and faster) row_number(). I also added the custom frame definition ROWS UNBOUNDED PRECEDING to help performance. (Obsolete, but harmless, in Postgres 16 or later.) See:

Efficient downsampling of a selected timeseries to equidistant samples

This multicolumn index can probably help performance (a lot):

CREATE INDEX foo_idx ON orders (customer_id, is_deleted, order_id, id DESC);

More performance optimization may be possible, highly likely even. But that depends on undisclosed details of your setup. You'd have to provide information for performance questions as requested in the comments.
Basics:

Optimal way to get a total count of rows in a paged query in Postgres?

1 Answers1