0

I have a scenario where I need to find the top 3 distinct values ordered by date in the interviewer_uid column for each interviewee_uid in the table.

For example a query from this...

interview_id    | interviewee_uid   | interviewer_uid   | date
1               | 1                 | 11                | 2019-07-29                
2               | 1                 | 12                | 2019-07-28                
3               | 1                 | 13                | 2019-07-27                
4               | 1                 | 14                | 2019-07-26            
5               | 1                 | 12                | 2019-07-25            
6               | 2                 | 11                | 2019-07-29                
7               | 2                 | 13                | 2019-07-28                
8               | 3                 | 12                | 2019-07-29                
9               | 3                 | 12                | 2019-07-28                
10              | 3                 | 12                | 2019-07-27                
11              | 3                 | 13                | 2019-07-26                
12              | 4                 | 11                | 2019-07-29    
13              | 4                 | 11                | 2019-07-28            

... would return the rows marked with an asterisks:

interview_id    | interviewee_uid   | interviewer_uid   | date
1               | 1                 | 11                | 2019-07-29 *              
2               | 1                 | 12                | 2019-07-28 *              
3               | 1                 | 13                | 2019-07-27 *              
4               | 1                 | 14                | 2019-07-26            
5               | 1                 | 12                | 2019-07-25            
6               | 2                 | 11                | 2019-07-29 *              
7               | 2                 | 13                | 2019-07-28 *      
8               | 3                 | 12                | 2019-07-29 *          
9               | 3                 | 12                | 2019-07-28                
10              | 3                 | 12                | 2019-07-27                
11              | 3                 | 13                | 2019-07-26 *          
12              | 4                 | 11                | 2019-07-29 *  
13              | 4                 | 11                | 2019-07-28

Notice that for interviewee_id 3 despite having 4 interviews we only take id 8 and 11 since we are looking for the distinct intervier_id. How can I do this without a for each loop iterating over each interviewee_id?

CJ Edgerton
  • 103
  • 1

1 Answers1

3
WITH cte AS ( SELECT *,
                     DENSE_RANK() OVER ( PARTITION BY interviewee_uid 
                                         ORDER BY interviewer_uid) iid,
                     ROW_NUMBER() OVER ( PARTITION BY interviewee_uid, 
                                                      interviewer_uid 
                                         ORDER BY idate DESC) did
              FROM datatablename )
SELECT * 
FROM cte
WHERE did = 1
  AND iid <= 3;

fiddle


UPDATE.

Previous query contains logical error.

Edited variant:

WITH 
cte1 AS ( SELECT interview_id, interviewee_uid, interviewer_uid, idate,
                 ROW_NUMBER() OVER (PARTITION BY interviewee_uid, 
                                                 interviewer_uid 
                                    ORDER BY idate DESC) did
          FROM test ),
cte2 AS ( SELECT interview_id, interviewee_uid, interviewer_uid, idate,
                 ROW_NUMBER() OVER (PARTITION BY interviewee_uid 
                                    ORDER BY idate DESC) did
          FROM cte1
          WHERE did = 1
)
SELECT interview_id, interviewee_uid, interviewer_uid, idate 
FROM cte2 
WHERE did <= 3;

fiddle (with demonstration that previous query is wrong).

Akina
  • 20,750
  • 2
  • 20
  • 22