I have this table in PostgreSQL 9.4:
CREATE TABLE user_operations(
id SERIAL PRIMARY KEY,
operation_id integer,
user_id integer )
The table consists of ~1000-2000 different operations each of which corresponds to some subset (consisting of approximately 80000-120000 elements each) of the set S of all users:
S = {1, 2, 3, ... , 122655}
Parameters:
work_mem = 128MB
table_size = 880MB
I also have an index on the operation_id.
QUESTION: What would be the most optimal plan for querying all distinct user_id for a significant part of the operation_id set (20%-60%) like:
SELECT DISTINCT user_id FROM user_operation WHERE operation_id < 500
It's possible to create more indexes on the table. Currently, the plan for the query is:
HashAggregate (cost=196173.56..196347.14 rows=17358 width=4) (actual time=1227.408..1359.947 rows=598336 loops=1)
-> Bitmap Heap Scan on user_operation (cost=46392.24..189978.17 rows=2478155 width=4) (actual time=233.163..611.182 rows=2518122 loops=1)
Recheck Cond: (operation_id < 500)
-> Bitmap Index Scan on idx (cost=0.00..45772.70 rows=2478155 width=0) (actual time=230.432..230.432 rows=2518122 loops=1)
Index Cond: (operation_id < 500)
Is such query plan really optimal in such circumstances? I mean, I'm not sure about correctness of using Bitmap Heap Scan. I'll appreciate any references to relevant articles.