select
user_id,
count(id) as unread_count
from
notifications
where
is_read = false
and user_id in(select(unnest('{200 user IDs}' :: bigint[])))
group by
user_id;
Problem is, this query runs for 1 minute and sometimes slightly more than that. The table is 32gb big, and there is already an index on the user_id field.
Here is an execution plan
HashAggregate (cost=123354.81..123629.64 rows=27483 width=16) (actual time=90823.880..90823.972 rows=188 loops=1)
Group Key: user_id
-> Nested Loop (cost=2.32..123217.40 rows=27483 width=16) (actual time=0.184..90752.136 rows=48571 loops=1)
-> HashAggregate (cost=1.76..2.76 rows=100 width=8) (actual time=0.146..0.577 rows=200 loops=1)
Group Key: unnest(200 user IDs)
-> Result (cost=0.00..0.51 rows=100 width=8) (actual time=0.021..0.073 rows=200 loops=1)
-> Index Scan using ix_notification_user_id on notification (cost=0.56..1229.40 rows=275 width=16) (actual time=119.659..453.533 rows=243 loops=200)
Index Cond: (200 user IDs)
Filter: (NOT is_read)
Rows Removed by Filter: 368
Planning time: 0.189 ms
Execution time: 90824.196 ms
I have tried a solution using a temp table, inserting the unnest values into the temp table and then comparing. But the performance hasn't improved at all.
I have run this query to see index stats:
schemaname,
tablename,
reltuples::bigint,
relpages::bigint,
otta,
round(case when otta = 0 then 0.0 else sml.relpages / otta::numeric end, 1) as tbloat,
relpages::bigint - otta as wastedpages,
bs*(sml.relpages-otta)::bigint as wastedbytes,
pg_size_pretty((bs*(relpages-otta))::bigint) as wastedsize,
iname,
ituples::bigint,
ipages::bigint,
iotta,
round(case when iotta = 0 or ipages = 0 then 0.0 else ipages / iotta::numeric end, 1) as ibloat,
case
when ipages < iotta then 0
else ipages::bigint - iotta
end as wastedipages,
case
when ipages < iotta then 0
else bs*(ipages-iotta)
end as wastedibytes
--CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize
from (
select
schemaname,
tablename,
cc.reltuples,
cc.relpages,
bs,
ceil((cc.reltuples*((datahdr + ma- (case when datahdr % ma = 0 then ma else datahdr % ma end))+ nullhdr2 + 4))/(bs-20::float)) as otta,
coalesce(c2.relname, '?') as iname,
coalesce(c2.reltuples, 0) as ituples,
coalesce(c2.relpages, 0) as ipages,
coalesce(ceil((c2.reltuples*(datahdr-12))/(bs-20::float)), 0) as iotta
-- very rough approximation, assumes all cols
from (
select
ma,
bs,
schemaname,
tablename,
(datawidth +(hdr + ma-
(
case
when hdr % ma = 0 then ma
else hdr % ma
end)))::numeric as datahdr,
(maxfracsum*(nullhdr + ma-
(
case
when nullhdr % ma = 0 then ma
else nullhdr % ma
end))) as nullhdr2
from
(
select
schemaname,
tablename,
hdr,
ma,
bs,
sum((1-null_frac)* avg_width) as datawidth,
max(null_frac) as maxfracsum,
hdr +(
select
1 + count(*)/ 8
from
pg_stats s2
where
null_frac <> 0
and s2.schemaname = s.schemaname
and s2.tablename = s.tablename ) as nullhdr
from
pg_stats s,
(
select
(
select
current_setting('block_size')::numeric) as bs,
case
when substring(v, 12, 3) in ('8.0',
'8.1',
'8.2') then 27
else 23
end as hdr,
case
when v ~ 'mingw32' then 8
else 4
end as ma
from
(
select
version() as v) as foo ) as constants
group by
1,
2,
3,
4,
5 ) as foo ) as rs
join pg_class cc on
cc.relname = rs.tablename
join pg_namespace nn on
cc.relnamespace = nn.oid
and nn.nspname = rs.schemaname
left join pg_index i on
indrelid = cc.oid
left join pg_class c2 on
c2.oid = i.indexrelid ) as sml
where
sml.relpages - otta > 0
or ipages - iotta > 10
order by
wastedbytes desc,
wastedibytes desc;
And both the PK index and user_id index have over 5gb of wastedsize and over 500k+ wastedpages.
My question is, what solution is there for this? Is it purely an index issue that needs reindex or is it something else that I am missing?
I am not allowed to change the structure of the table, I simply have to optimize it to somehow go from 1+ minutes to under 1s
After adding partial index on user_id where is_read = false, the query time was reduced by roughly ~10-15 seconds. But it's obviously still taking too long.
EDIT: There's a total of 32.5 million rows in this table. Running this query:
SELECT t.user_id, COALESCE(unread_count, 0) AS unread_count
FROM unnest('{200 user_ids}'::bigint[]) t(user_id)
LEFT JOIN LATERAL (
SELECT count(*) AS unread_count
FROM notification n
WHERE n.user_id = t.user_id
AND n.is_read = false
) sub ON true
;
results in this execution plan (funny enough, yesterday this ran for over a minute, today for ~30 sec or under):
Nested Loop Left Join (cost=1209.05..120908.50 rows=100 width=16) (actual time=333.088..27260.557 rows=200 loops=1)
Buffers: shared hit=1981 read=20396 dirtied=7
I/O Timings: read=27023.896
-> Function Scan on unnest t (cost=0.00..1.00 rows=100 width=8) (actual time=0.022..0.360 rows=200 loops=1)
-> Aggregate (cost=1209.04..1209.05 rows=1 width=8) (actual time=136.292..136.293 rows=1 loops=200)
Buffers: shared hit=1981 read=20396 dirtied=7
I/O Timings: read=27023.896
-> Index Only Scan using ix_test on notification n (cost=0.44..1208.29 rows=300 width=0) (actual time=2.153..136.170 rows=105 loops=200)
Index Cond: (user_id = t.user_id)
Heap Fetches: 21088
Buffers: shared hit=1981 read=20396 dirtied=7
I/O Timings: read=27023.896
Planning time: 0.135 ms
Execution time: 27260.745 ms