I am observing significant performance impact of select lists in sorted UNION queries.
The general form of UNION queries I am working with is:
SELECT * FROM (
SELECT <select_list> FROM <table>
UNION ALL
...
) q
ORDER BY <column>
The outer select is used as it significantly improves performance in the presence of ORDER BY, but that is outside of the scope of this question.
UNION ALL is always used, rather than UNION.
I will refer to performance as either "fast" (instantaneous) or "slow" (5 seconds or longer).
Performance was measured by running queries in the DBeaver console, which by default retrieves only the first page, whose size is 200.
Changing the select list affects performance in the following ways:
- When
<select_list> = <column>(i.e., only the sorting column is selected), queries are fast. The execution plan shows index scans for the sorting column and merge joins for concatenation. - When the select list includes other columns in addition to the sorting column, queries may be fast or slow.
It has been observed that when the following conditions are all true, queries are fast:
- Select list includes the clustered index column.
- Select list starts with the clustered index column, or with the sorting column immediately followed by the clustered index column.
Observations were made using Microsoft SQL Server 2019 (RTM-CU26) (KB5035123) - 15.0.4365.2 (X64).
I could not find anything about the effects of select lists on the performance of UNION in the SQL Server documentation.
A simplified description of the environment in which experiments were conducted, and the queries themselves are given below.
CREATE TABLE AUDIT1 (
ID bigint NOT NULL,
AUDITDATE datetime2 NULL,
[USER] bigint NULL,
-- Implies clustered index.
CONSTRAINT PK_AUDIT1 PRIMARY KEY (ID)
);
CREATE INDEX I_AUDIT1_AUDITDATE ON AUDIT1 (AUDITDATE);
CREATE TABLE AUDIT2 (
ID bigint NOT NULL,
AUDITDATE datetime2 NULL,
[USER] bigint NULL,
-- Implies clustered index.
CONSTRAINT PK_AUDIT2 PRIMARY KEY (ID)
);
CREATE INDEX I_AUDIT2_AUDITDATE ON AUDIT2 (AUDITDATE);
- Table
AUDIT1contains 10 million records. - Table
AUDIT2contains 1 million records. - Values of
AUDITDATEinAUDIT2are greater than those inAUDIT1. - Values of
AUDITDATEare aligned withIDin an increasing sequence, i.e.,IDis always increasing, and so isAUDITDATE.
Query 1: select list contains only the sorting column (fast).
SELECT * FROM (
SELECT AUDITDATE FROM AUDIT2
UNION ALL
SELECT AUDITDATE FROM AUDIT1
) q
ORDER BY AUDITDATE
Execution plan:
|--Merge Join(Concatenation)
|--Index Scan(AUDIT2.I_AUDIT2_AUDITDATE), ORDERED BACKWARD
|--Index Scan(AUDIT1.I_AUDIT1_AUDITDATE), ORDERED BACKWARD
Query 2: select list of size > 1, contains the sorting column, does not contain the clustered index column (slow).
SELECT * FROM (
SELECT [USER], AUDITDATE FROM AUDIT2
UNION ALL
SELECT [USER], AUDITDATE FROM AUDIT1
) q
ORDER BY AUDITDATE
Note that placing the sorting column first was observed to have no effect.
Query 3: select list of size > 1, contains the sorting column, contains the clustered index column, first column is neither the sorting column, nor the clustered index column (slow).
SELECT * FROM (
SELECT [USER], ID, AUDITDATE FROM AUDIT2
UNION ALL
SELECT [USER], ID, AUDITDATE FROM AUDIT1
) q
ORDER BY AUDITDATE
Query 4: select list of size > 1, contains the sorting column, contains the clustered index column, first column is either the sorting column, or the clustered index column (fast).
SELECT * FROM (
SELECT ID, [USER], AUDITDATE FROM AUDIT2
UNION ALL
SELECT ID, [USER], AUDITDATE FROM AUDIT1
) q
ORDER BY AUDITDATE
This query shows that if the first column is ID, the query is fast.
Execution plans for queries 2, 3, 4 are the same:
|--Parallelism(Gather Streams, ORDER BY:([Union1007] ASC))
|--Sort(ORDER BY:([Union1007] ASC))
|--Concatenation
|--Parallelism(Distribute Streams, RoundRobin Partitioning)
|--Clustered Index Scan(OBJECT:(AUDIT2.PK_AUDIT2))
|--Clustered Index Scan(OBJECT:(AUDIT1.PK_AUDIT1))



