3

I have a table with about 10 million records. I want to do a simple group by, but it's using a sequential scan and is slow...

select run_id, count(*) from result group by run_id;

I have an index defined on the run_id column.

How can I speed this up?

FogleBird
  • 131
  • 4

2 Answers2

3

So why does Postgres 9.2 still show a sequential scan? I quote the Postgres Wiki:

Is "count(*)" much faster now?

A traditional complaint made of PostgreSQL, generally when comparing it unfavourably with MySQL (at least when using the MyIsam storage engine, which doesn't use MVCC) has been "count(*) is slow". Index-only scans can be used to satisfy these queries without there being any predicate to limit the number of rows returned, and without forcing an index to be used by specifying that the tuples should be ordered by an indexed column. However, in practice that isn't particularly likely.

It is important to realise that the planner is concerned with minimising the total cost of the query. With databases, the cost of I/O typically dominates. For that reason, "count(*) without any predicate" queries will only use an index-only scan if the index is significantly smaller than its table. This typically only happens when the table's row width is much wider than some indexes'.

Emphasis mine.
There is hardly anything to gain from an index scan here, as long as your talbe isn't bloated with more (big) columns.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
0

I upgraded to PostgreSQL 9.2 overnight.

EXPLAIN still shows the query using a sequential scan, but the query seems to run much faster now... under 10 seconds. This will be sufficient for my needs.

Thanks to everyone for the help in the comments.

FogleBird
  • 131
  • 4