A question came up about an error in a Stack Exchange Data Explorer query which I tried to diagnose. The error reads:
Cannot resolve collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "SQL_Latin1_General_CP1_CS_AS" in CASE operator occurring in GROUP BY statement column 4.
As I already knew from past and present experience how to resolve that error, I only had to find the exact statement that threw that error. That is where I started looking for a group by statement with 4 columns. There are none.
Running this reproduction SEDE query (or DbFiddle kindly provided by Martin Smith):
(note that SEDE only saves working queries so please fork and remove the comment from the collate)
select count(*)
from #users
group by coalesce(
username
, displayname
-- comment out next line
collate SQL_Latin1_General_CP1_CI_AS
)
will give this error, telling me to look at column 2:
Cannot resolve collation conflict between "SQL_Latin1_General_CP1_CS_AS" and "SQL_Latin1_General_CP1_CI_AS" in CASE operator occurring in GROUP BY statement column 2.
Even if I know that the specific coalesce expands to
group by
case
when username is not null
then username
else displayname
end
I still don't find it logical to call that a group by with 2 columns. Even the execution plan doesn't seem to have more than one expression:
|--Stream Aggregate(GROUP BY:([Expr1003]) DEFINE:([Expr1007]=Count(*)))
|--Sort(ORDER BY:([Expr1003] ASC))
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [#users].[Username] IS NOT NULL THEN [#users].[Username] ELSE CONVERT(nvarchar(40),[#users].[Displayname],0) END))
How should I adjust my mental model of what GROUP BY considers to be a "column"?