1

I have a t-sql query that executes hourly, pulling in a variable amount of data depending on the current time when the query executes, i.e.: if it is executing between midnight and 2 AM local time (7 to 9 AM UTC) I want to pull in the last 120 days' worth of data; outside that window I only want to bring in 3 days' worth of data.

While creating the query I tried a few things to avoid having to repeat the query in an IF ELSE statement with hard coded values. As a baseline for testing I created the following query:

SELECT  COUNT(*)
FROM    dbo.Tickets t
        JOIN dbo.TicketsDetails td ON  t.ticketGUID = td.ticketGUID
WHERE   td.dtCreated >  DATEADD(dd, -1, CAST(GETUTCDATE() AS date))

With the hardcoded interval it returns a value of about 750,000 in .829 seconds. When I modify it to use local variables (the second or third WHERE clause below), however, execution time explodes to over 10 minutes:

DECLARE @Interval INT,
        @StartDate DATE;
SELECT  @Interval = CASE WHEN DATEPART(hh, GETUTCDATE()) IN (7, 8) THEN -120 ELSE -1 END
        , @StartDate = DATEADD(dd, @Interval, CAST(GETUTCDATE() AS date));
SELECT  COUNT(*)
FROM    dbo.Tickets t
        JOIN dbo.TicketsDetails td
            ON  t.ticketGUID = td.ticketGUID
--WHERE td.dtCreated >  DATEADD(dd, -1, CAST(GETUTCDATE() AS date))
WHERE   td.dtCreated >  DATEADD(dd, @Interval, CAST(GETUTCDATE() AS date))
--WHERE td.dtCreated >  @StartDate

My question is why does this happen, and if this is working as designed what workarounds are there so I don't have to double the code?

1 Answers1

1

I understand the question, that you do not want to repeat the select element, but the problem is the use of variables as parameters. Repeating the select bit is just copy and paste, and, in this case, by removing the variable declarations and assignments, the code is quicker to write anyway.

IF (DATEPART(hh, GETUTCDATE()) IN (7, 8)) THEN 
  SELECT COUNT(*)
  FROM dbo.Tickets t
  JOIN dbo.TicketsDetails td ON  t.ticketGUID = td.ticketGUID
  WHERE td.dtCreated >  DATEADD(dd, -120, CAST(GETUTCDATE() AS date));
ELSE
  SELECT COUNT(*)
  FROM dbo.Tickets t
  JOIN dbo.TicketsDetails td ON  t.ticketGUID = td.ticketGUID
  WHERE td.dtCreated >  DATEADD(dd, -3, CAST(GETUTCDATE() AS date));

Using variables forces the optimizer to produce a plan that could work for all future executions with different variable values. It does this by using an average value from statistics. This does not always work well in practice.

Alternatives involve asking the optimizer to produce a plan for a specified representative variable value, or recompiling on each execution and using the specific value of the variable each time:

DECLARE 
    @Interval integer = 
        CASE WHEN DATEPART(hh, GETUTCDATE()) IN (7, 8) THEN -120 ELSE -1 END;
DECLARE
    @StartDate date = 
        DATEADD(dd, @Interval, CAST(GETUTCDATE() AS date));

-- If the plan for @Interval = -1 works well enough for all @Interval values SELECT
COUNT(*) FROM dbo.Tickets AS t JOIN dbo.TicketsDetails AS td ON td.ticketGUID = t.ticketGUID WHERE td.dtCreated > DATEADD(dd, @Interval, CAST(GETUTCDATE() AS date)) OPTION ( OPTIMIZE FOR (@Interval = -1) );

or

-- If you want a fresh plan for each @Interval or @StartDate value
SELECT  
    COUNT(*)
FROM dbo.Tickets AS t
JOIN dbo.TicketsDetails AS td
    ON td.ticketGUID = t.ticketGUID
WHERE
    td.dtCreated > DATEADD(dd, @Interval, CAST(GETUTCDATE() AS date))
/*
WHERE 
    td.dtCreated >  @StartDate
*/
OPTION
(
    -- Fresh plan, embed variable values before optimization
    RECOMPILE
);
Paul White
  • 94,921
  • 30
  • 437
  • 687
tea boy
  • 59
  • 3