8

I am using Postgresql 9.0. I have the following fields in a table: id, name.

 id    name 
 1     John
 1     Mary
 1     Mary
 1     Mary
 1     John
 1     Mary
 3     Paul
 3     Paul
 3     George
 .     .
 .     .

For each id, I want to select the name that occurs the most. How can I do that?

I tried with the following query but it doesn't work:

select id, max(name) 
from table 
group by id;
Andriy M
  • 23,261
  • 6
  • 60
  • 103
Tudor
  • 191
  • 1
  • 4

1 Answers1

10

This isn't trivial. First, you want group by id and name and count the rows:

SELECT COUNT(*)
...
GROUP BY id, name

Then select the maximum count for every id. One way to achieve this is by window functions. The RANK() function:

RANK() OVER (PARTITION BY id ORDER BY COUNT(*) DESC)

assigns a number to every row of the result (after the grouping is done), arranging them (the rows) in partitions with the same id and ordered by COUNT(*) DESC, so for every (partition of) id, the row(s) with the maximum count are assigned a rank of 1. Thus we need to put the above in a derived table and use a WHERE condition to keep only these rows:

WHERE rnk = 1

The final query is like this:

SELECT
    id, name, cnt
FROM
    ( SELECT id, name, COUNT(*) AS cnt,
             RANK() OVER (PARTITION BY id ORDER BY COUNT(*) DESC) AS rnk
      FROM tableX
      GROUP BY id, name
    ) AS tg 
WHERE
    rnk = 1 ;

Tested at SQL-Fiddle


Note that if you have ties in the first place (two or more names with the same maximum count), all these will be returned. If you want strictly one row per id in the final results, you have to use the ROW_NUMBER() instead of the RANK() and possibly alter the ORDER BY clause to explicitly select how the ties will be resolved:

ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(*) DESC, name ASC) AS rnk

Tested: SQL-Fiddle test-2.

ypercubeᵀᴹ
  • 99,450
  • 13
  • 217
  • 306