Performance comparison between using Join and Window function to get lead and lag values

Question

I have a table with 20M rows, and each row has 3 columns: time, id, and value. For each id and time, there is a value for the status. I want to know the lead and lag values of a certain time for a specific id.

I have used two methods to achieve this. One method is using join and another method is using the window functions lead/lag with clustered index on time and id.

I compared the performance of these two methods by execution time. The join method takes 16.3 seconds and the window function method takes 20 seconds, not including the time to create the index. This surprised me because the window function seems to be advanced while the join methods is brute force.

Here is the code for the two methods:

Create Index

create clustered index id_time
 on tab1 (id,time)

Join method

select a1.id,a1.time
   a1.value as value, 
   b1.value as value_lag,
   c1.value as value_lead
into tab2
from tab1 a1
left join tab1 b1
on a1.id = b1.id
and a1.time-1= b1.time
left join tab1 c1
on a1.id = c1.id
and a1.time+1 = c1.time

IO statistics generated using SET STATISTICS TIME, IO ON:

Here is the execution plan for the join method

Window Function method

select id, time, value, 
   lag(value,1) over(partition by id order by id,time) as value_lag,
   lead(value,1) over(partition by id order by id,time) as value_lead
into tab2
from tab1

(Ordering only by time saves 0.5 seconds.)

Here is the execution plan for Window function method

IO statistics

[ Statistics for Window function method 4]

I checked the data in sample_orig_month_1999 and it seems that the raw data is well ordered by id and time. Is this the reason of performance difference?

It seems that the join method has more logical reads than the window function method, while the execution time for the former is actually less. Is it because the former has a better parallelism?

I like the window function method because of the concise code, is there any way to speed it up for this specific problem?

I'm using SQL Server 2016 on Windows 10 64 bit.

Paul White · Accepted Answer · 2016-12-17T05:52:20.853

The relatively low row-mode performance of LEAD and LAG window functions compared with self joins is nothing new. For example, Michael Zilberstein wrote about it on SQLblog.com back in 2012. There is quite a bit of overhead in the (repeated) Segment, Sequence Project, Window Spool, and Stream Aggregate plan operators:

In SQL Server 2016, you have a new option, which is to enable batch mode processing for the window aggregates. This requires some sort of columnstore index on the table, even if it is empty. The presence of a columnstore index is currently required for the optimizer to consider batch mode plans. In particular, it enables the much more efficient Window Aggregate batch-mode operator.

To test this in your case, create an empty nonclustered columnstore index:

 -- Empty CS index
CREATE NONCLUSTERED COLUMNSTORE INDEX dummy 
ON dbo.tab1 (id, [time], [value]) 
WHERE id < 0 AND id > 0;

The query:

SELECT
    T1.id,
    T1.[time],
    T1.[value],
    value_lag = 
        LAG(T1.[value]) OVER (
            PARTITION BY T1.id
            ORDER BY T1.[time]),
    value_lead =
        LEAD(T1.[value]) OVER (
            PARTITION BY T1.id
            ORDER BY T1.[time])
FROM dbo.tab1 AS T1;

Should now give an execution plan like:

...which may well execute much faster.

You may need to use an OPTION (MAXDOP 1) or other hint to get the same plan shape when storing the results in a new table. The parallel version of the plan requires a batch mode sort (or possibly two), which may well be a little slower. It rather depends on your hardware.

For more on the Batch Mode Window Aggregate operator, see the following articles by Itzik Ben-Gan:

Performance comparison between using Join and Window function to get lead and lag values

Join method

Window Function method

1 Answers1