15

Lets say I have the following table

user_id comment
2 thats cool
2 awesome
3 i hate this
3 okay
6 this is weird
6 hello?
6 what is it
9 how are you
16 too slow
16 yes
17 alrighty

How can you select one row per user_id? So my results would be:

user_id comment
2 thats cool
3 i hate this
6 this is weird
9 how are you
16 too slow
17 alrighty

Is this possible with a single efficient query? Or are sub-selected necessary? Is it possible to somehow use DISTINCT on a single column?

Rohit Gupta
  • 2,116
  • 8
  • 19
  • 25
Jake Wilson
  • 2,487
  • 8
  • 22
  • 23

2 Answers2

11

That's what GROUP BY is used for. Get one row (per group). In this case, it will show all distinct user_id values and for the rest of the columns, you can (have to) use aggregate functions like MIN(), MAX(), AVG(), SUM() as you will have more than one values per group and only one can be shown.

SELECT
    user_id
  , MIN(comment) AS comment  -- it will show the first in alphabetical order  
                             -- you could also use MAX()
FROM
    tableX
GROUP BY
    user_id ;

MySQL allows also the following unorthodox solution, that will return one (more or less random) comment per user:

SELECT
    user_id
  , comment
FROM
    tableX
GROUP BY
    user_id ;

This last query will not work but raise an error if the (stricter) ONLY_FULL_GROUP_BY mode is enabled. In the recently released 5.7 version, this mode is the default and a new function, ANY_VALUE(), is provided. For more details, see the MySQL Handling of GROUP BY page. The query can be written now:

SELECT
    user_id
  , ANY_VALUE(comment) AS comment
FROM
    tableX
GROUP BY
    user_id ;

Note that with either the "unorthodox" version or using the recent ANY_VALUE() function, if we add more columns in the SELECT list, their values is not guaranteed to be from the same row, just from a row in the same group. The way they are selected is not exactly random, depends on the execution plan and the indexes used.

ypercubeᵀᴹ
  • 99,450
  • 13
  • 217
  • 306
-1

I have a cheap-and-nasty solution for this. Put query that creates the original data in a CTE and add a RANK() column which you partition by user_id, and order by comment

Then select from the CTE where rank = 1

tinlyx
  • 3,810
  • 14
  • 50
  • 79