Working with Microsoft SQL Server 2017 (RTM-GDR) (KB4505224) - 14.0.2027.2 (X64)
We have a single table, which is planned to contain about 1.5 - 2, maybe max 3 million records at a time. Records will continuously be moved into historical tables. Here is the table definition:
CREATE TABLE [dbo].[MyTable](
[Id] [int] IDENTITY(1,1) NOT NULL,
[MainKey] [nvarchar](20) NOT NULL,
[SecondaryKey] [nvarchar](100) NOT NULL,
[Value] [nvarchar](max) NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [UC_MyTable] UNIQUE NONCLUSTERED
(
[MainKey] ASC,
[SecondaryKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
The table will basically store key-value pairs belonging to multiple entities. MainKey is the key of the entity, SecondaryKey is the key in the key-value pair, and Value is obviously the value.
Now, the problem: we need to select all key-value pairs for all entities, where there is at least one key-value pair in which both the key and the value matches certain filters. Unfortunately, for the key filter, we need to use LIKE because there is a number in the middle of the string, and it can be any number. This is the query we have now:
select *
from MyTable
where
MainKey in (
select MainKey
from MyTable
where
SecondaryKey LIKE 'begin.%.end'
and Value in ('special_value1','special_value7','special_value10')
)
order by MainKey, SecondaryKey;
This query runs in 17-21 seconds for a total of 3 003 216 records at the moment, and returns 914 040 matching records (using random-generated dummy data). This query will run many times, so I'm trying to make it run faster. I don't have a very specific goal for execution time, it should be "as fast as possible", but let's say around 5 seconds would be an achievement already.
My initial idea was the LIKE and the IN filtering in the inner query, however when I'm running only that part, it returns 15 160 records un 1-2 seconds. So it seems it is getting slower with the self-join. This is the actual execution plan:
Unfortunately this doesn't say too much to me. I can also export the plan and upload it as XML if it helps.
Finally I'm adding the script I used to generate the random data:
declare @count integer = 1;
declare @count2 integer;
declare @count3 integer;
declare @random integer;
declare @value_match_random integer = 30;
-- total of 50 000 separate entity keys
while @count <= 50000
begin
-- between 30 and 60 key-value pairs for each entity
select @count2 = abs(checksum(newid()) % 60) + 30;
set @count3 = 1;
-- insert random records which will never match the criteria
while @count3 <= @count2
begin
insert into MyTable (MainKey, SecondaryKey, Value) values (@count, concat('SecondaryKey_', @count3), concat('Value_', @count3));
set @count3 = @count3 + 1;
end;
select @random = abs(checksum(newid()) % 100) + 1;
-- give @value_match_random % chance to insert a matching key-value pair also
if @random <= @value_match_random
begin
insert into MyTable (MainKey, SecondaryKey, Value) values (@count, 'begin.1.end', 'special_value1');
end;
set @count = @count + 1;
end;
Added execution plan here: https://www.brentozar.com/pastetheplan/?id=r12VQEqbt
Another update is that today (after 1 week when I was on leave), I ran the same query again on the same table with same contents, and SSMS suggested that I create the following index:
CREATE NONCLUSTERED INDEX idx_MyTable_SecondaryKey
ON [dbo].[MyTable] ([SecondaryKey])
INCLUDE ([MainKey],[Value])
GO
I created it, and collected the above execution plan after this index is created. 1 week earlier, when I posted this question, I don't remember seeing this index suggestion.
