What defines a "column" in a GROUP BY statement

Question

A question came up about an error in a Stack Exchange Data Explorer query which I tried to diagnose. The error reads:

Cannot resolve collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "SQL_Latin1_General_CP1_CS_AS" in CASE operator occurring in GROUP BY statement column 4.

As I already knew from past and present experience how to resolve that error, I only had to find the exact statement that threw that error. That is where I started looking for a group by statement with 4 columns. There are none.

Running this reproduction SEDE query (or DbFiddle kindly provided by Martin Smith):
^{(note that SEDE only saves working queries so please fork and remove the comment from the collate)}

select count(*)
from #users
group by coalesce(
         username
       , displayname
         -- comment out next line
         collate SQL_Latin1_General_CP1_CI_AS
       )

will give this error, telling me to look at column 2:

Cannot resolve collation conflict between "SQL_Latin1_General_CP1_CS_AS" and "SQL_Latin1_General_CP1_CI_AS" in CASE operator occurring in GROUP BY statement column 2.

Even if I know that the specific coalesce expands to

group by
  case 
  when username is not null
  then username 
  else displayname 
  end

I still don't find it logical to call that a group by with 2 columns. Even the execution plan doesn't seem to have more than one expression:

|--Stream Aggregate(GROUP BY:([Expr1003]) DEFINE:([Expr1007]=Count(*)))                                                                                                     
  |--Sort(ORDER BY:([Expr1003] ASC))                                                                                                                                     
    |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [#users].[Username] IS NOT NULL THEN [#users].[Username] ELSE CONVERT(nvarchar(40),[#users].[Displayname],0) END))

How should I adjust my mental model of what GROUP BY considers to be a "column"?

score 0 · Answer 1 · answered Apr 29 '24 at 11:27

in the original post

you have

GROUP BY
    PostHistory.PostId,
    PostHistory.UserId,
    coalesce(Users.DisplayName, PostHistory.UserDisplayName)

UserDisplayName is the 4th

and for COALESCE to work the need to have te same collation

And in your query

group by coalesce(
     username
   , displayname
     -- comment out next line
     collate SQL_Latin1_General_CP1_CI_AS

displayname is the 2nd column

After the GROUP BY you start to count username number 1 displayname number two and so on in the order the appear after the GROUP BY ignoring sql commands like COALESCE

What defines a "column" in a GROUP BY statement

1 Answers1