Is there any documentation or research about changes in SQL Server 2016 to how cardinality is estimated for predicates containing SUBSTRING() or other string functions?
The reason I'm asking is that I was looking at a query whose performance degraded in compatibility mode 130 and the reason was related a change in the estimate of the number of rows that match a WHERE clause that contained a call to SUBSTRING(). I corrected the problem with a query rewrite, but am wondering if anyone is aware of any documentation about changes in this area in SQL Server 2016.
Demo code is below. The estimates are very close in this test case, but the accuracy varies depending on the data.
In the test case, in compat level 120, SQL Server appears to be using the histogram for the estimate, whereas in compat level 130 SQL Server appears to be assuming a fixed 10% of the table matches.
CREATE DATABASE MyStringTestDB;
GO
USE MyStringTestDB;
GO
DROP TABLE IF EXISTS dbo.StringTest;
CREATE TABLE dbo.StringTest ( [TheString] varchar(15) );
GO
INSERT INTO dbo.StringTest
VALUES
( 'Y5_CLV' );
INSERT INTO dbo.StringTest
VALUES
( 'Y5_EG3' );
INSERT INTO dbo.StringTest
VALUES
( 'ZY_NE' );
INSERT INTO dbo.StringTest
VALUES
( 'ZY_PQT' );
INSERT INTO dbo.StringTest
VALUES
( 'ZY_T2V' );
INSERT INTO dbo.StringTest
VALUES
( 'ZY_TT4' );
INSERT INTO dbo.StringTest
VALUES
( 'ZY_ZKK' );
INSERT INTO dbo.StringTest
VALUES
( 'ZZ_LW6' );
INSERT INTO dbo.StringTest
VALUES
( 'ZZ_QO3' );
INSERT INTO dbo.StringTest
VALUES
( 'ZZ_TZ7' );
INSERT INTO dbo.StringTest
VALUES
( 'ZZ_UZZ' );
CREATE CLUSTERED INDEX IX_Clustered ON dbo.StringTest (TheString);
/*
Uses fixed % for estimate; 1.1 rows estimated in this case.
Plan for computation:
CSelCalcFixedFilter (0.1) <----
Selectivity: 0.1
*/
ALTER DATABASE MyStringTestDB SET compatibility_level = 130;
GO
SELECT *
FROM dbo.StringTest
WHERE SUBSTRING(TheString, 1, CHARINDEX('_',TheString) - 1) = 'ZZ'
OPTION (QUERYTRACEON 2363, QUERYTRACEON 3604);
/*
Uses histogram to get estimate of 1
CSelCalcPointPredsFreqBased <----
Distinct value calculation:
CDVCPlanLeaf
0 Multi-Column Stats, 1 Single-Column Stats, 0 Guesses
Individual selectivity calculations:
(none)
Loaded histogram for column QCOL: [DBA].[dbo].[StringTest].TheString from stats with id 1
*/
ALTER DATABASE MyStringTestDB SET compatibility_level = 120;
GO
SELECT *
FROM dbo.StringTest
WHERE SUBSTRING(TheString, 1, CHARINDEX('_',TheString) - 1) = 'ZZ'
OPTION (QUERYTRACEON 2363, QUERYTRACEON 3604);
/*
-- Simpler rewrite; works fine in both compat levels and gets better estimate.
SELECT *
FROM dbo.StringTest
WHERE TheString LIKE 'ZZ[_]%'
OPTION (QUERYTRACEON 2363, QUERYTRACEON 3604);
*/
