3

I have a dataset of the following structure:

Target, PollTime, Value
------------------------
1. A, 2013-12-13 13:31:44.000, 15.00
2. A, 2013-12-13 13:32:44.000, 16.00
3. A, 2013-12-13 13:33:44.000, 16.00
4. A, 2013-12-13 13:34:44.000, NULL
5. A, 2013-12-13 13:35:44.000, NULL
6. A, 2013-12-13 13:36:44.000, 15.00
7. B, 2013-12-21 05:29:34.000, 3.00
8. B, 2013-12-21 05:30:34.000, NULL
9. B, 2013-12-21 05:31:34.000, NULL
10. B, 2013-12-21 05:32:34.000, NULL
11. B, 2013-12-21 05:33:34.000, 4.00
12. B, 2013-12-21 05:34:34.000, NULL
13. B, 2013-12-21 05:35:34.000, NULL
14. B, 2013-12-21 05:36:34.000, 5.00

I would like to build a resultset that groups the null values, showing the start, end time and duration. (Null means the target was unavialble).

Target, StartDate, EndDate, Duration(min)
---------------------------------------------
1. A, 2013-12-13 13:34:44.000,2013-12-13 13:36:44.000, 2
2. B, 2013-12-21 05:30:34.000,2013-12-21 05:33:34.000 3
3. B, 2013-12-21 05:34:34.000,2013-12-21 05:36:34.000, 2

The StartDate would be the first NULL Value, while the EndDate would be the next NON-NULL Value, and my duration calcualtion would be based on those two values.

Any clues to build this query would be great.

As I am thinking of joining the table to itself, but I am not sure where to start.

Paul White
  • 94,921
  • 30
  • 437
  • 687

4 Answers4

4

As pointed out, this is better solved by Gaps and Islands.

WITH T
 AS (SELECT *,
            DENSE_RANK() OVER (PARTITION BY Target ORDER BY PollTime) - 
            DENSE_RANK() OVER (PARTITION BY Target, Value ORDER BY PollTime) AS Grp
     FROM   [schema].[Table] 
     )
SELECT Target,
       MIN(PollTime) AS Start,
       MAX(PollTime) AS Finish,
       DATEDIFF(mi, MIN(PollTime), MAX(PollTime)) AS Duration
FROM   T
GROUP  BY Target,
          Grp
HAVING MAX(Value) is null
ORDER  BY Start 

Original source for code: Getting each status change in a table

Valkyrie
  • 641
  • 5
  • 15
4

For SQLServer 2008
As the OP commented that he uses SQLServer 2008 the other query cannot be used.
At first I tried to convert them, then a new idea came to me

Converted

WITH T AS (
  SELECT Target, PollTime, Value
       , ID = ROW_NUMBER() OVER (PARTITION BY Target ORDER BY PollTime)
       , IsNull = CASE WHEN Value IS NULL THEN 1 ELSE 0 END
 FROM   [Table1] 
), A AS (
  SELECT t1.Target
       , StartDate = t1.PollTime
       , EndDate = COALESCE(t3.PollTime, t1.PollTime)
       , t1.Value
       , Block = t1.ID - SUM(t2.IsNull)
  FROM   T t1
         INNER JOIN T t2 ON t1.Target = t2.Target And t2.ID <= t1.ID
         LEFT  JOIN T t3 ON t1.Target = t3.Target And t3.ID = t1.ID + 1
  GROUP BY t1.Target, t1.PollTime, t1.Value, t1.ID, t3.id, t3.PollTime
)
SELECT Target
     , StartDate = MIN (StartDate)
     , EndDate = MAX(EndDate)
     , [Duration(min)] = DATEDIFF(mi, min(StartDate), max(EndDate))
FROM   A
WHERE  Value is Null
GROUP BY Target, Block
ORDER BY Target, Block

SQLFiddle demo

Same idea of the queries for SQLServer 2012, get a rank for every block, the LEAD in this is the LEFT JOIN T t3 ON t1.Target = t3.Target And t3.ID = t1.ID + 1 and the A CTE

Different idea

WITH T AS (
  SELECT Target, PollTime, Value
       , ID = ROW_NUMBER() OVER (PARTITION BY Target ORDER BY PollTime)
 FROM   [Table1] 
), B AS (
  SELECT t1.Target
       , ID1 = min(t1.ID)
       , ID2 = t2.ID
  FROM   T t1
         CROSS APPLY (SELECT TOP 1 ID 
                      FROM   T i 
                      WHERE  i.Value is not null 
                        AND  i.Target = t1.Target
                        AND  i.ID > t1.ID
                      ORDER BY i.Target, i.ID) t2
  WHERE  t1.Value is Null
  GROUP BY t1.Target, t2.ID
)
SELECT b.Target
     , StartDate = MIN(t.PollTime)
     , EndDate = MAX(t.PollTime)
     , [Duration(min)] = datediff(mi, min(t.PollTime), max(t.PollTime))
FROM   B
       INNER JOIN T ON B.Target = T.Target And T.ID IN (b.ID1, b.ID2)
GROUP BY b.Target, b.ID2
ORDER BY b.target, b.ID2       

SQLFiddle demo

Why care about block ranking? Just get the first row where value is not null after the current one (in the CROSS APPLY) for a block of null it will be the same, so a GROUP BY can effectively be used to get the first ID of the null block (done in the B CTE).
Having the two ID getting the data is easily to derive.

For SQLServer 2012
I found two ways to get the result, but I haven't checked the performances.

First one

WITH DATA AS (
  SELECT target
       , pollTime
       , Value
       , Block = SUM(CASE WHEN Value IS NULL THEN 0 ELSE 1 END)
                 OVER (Partition BY Target ORDER BY PollTime)
               * CASE WHEN Value IS NULL THEN 1 ELSE 0 END
       , NextPollTime = LEAD(PollTime, 1, PollTime)
                        OVER (Partition BY Target ORDER BY PollTime)
  FROM   table1
)
SELECT target
     , StartDate = min(pollTime)
     , EndDate = max(NextPollTime)
     , [Duration(min)] = datediff(mi, min(pollTime), max(NextPollTime))
FROM   DATA
WHERE  Block > 0
GROUP BY target, block
ORDER BY target, block

The * CASE WHEN Value IS NULL THEN 1 ELSE 0 END in the Block field is to have a number only for the block of NULL.
The LEAD is to get the value of the first row after the block of NULL

SQLFiddle demo

Second one

WITH DATA AS (
  SELECT target
       , pollTime
       , Value
       , Block = SUM(CASE WHEN Value IS NULL THEN 0 ELSE 1 END)
                 OVER (Partition BY Target ORDER BY PollTime)
               - CASE WHEN Value IS NULL THEN 0 ELSE 1 END
  FROM   table1
)
SELECT target
     , StartDate = min(pollTime)
     , EndDate = max(pollTime)
     , [Duration(min)] = datediff(mi, min(pollTime), max(pollTime))

FROM   DATA
GROUP BY target, block
HAVING count(value) < count(1)
ORDER BY target, block

The - CASE WHEN Value IS NULL THEN 0 ELSE 1 END in the block field shift the rank by 1 row, the first row of the next block is now the last row of the NULL block.

SQLFiddle demo

Serpiton
  • 641
  • 6
  • 13
4

The following is an alternative SQL Server 2008 R2 solution that is efficient when:

  1. There are relatively few NULLs; and
  2. The right supporting indexes are available

The gaps-and-islands method may be faster if:

  1. The table is not too large (minimizing sorting costs); and
  2. NULLs form a high proportion of the data.

Sample data and required indexes

CREATE TABLE dbo.Table1
(
    [Target] char(1) NOT NULL, 
    [PollTime] datetime NOT NULL, 
    [Value] varchar(5) NULL,

    PRIMARY KEY ([Target], [PollTime])
);

-- Optional helpful filtered index to find NULL rows in required order
CREATE UNIQUE INDEX fi1
ON dbo.Table1 ([Target], [PollTime]) 
INCLUDE (Value) 
WHERE Value IS NULL;

-- Sample data
INSERT dbo.Table1
    ([Target], [PollTime], [Value])
VALUES
    ('A', '2013-12-13 13:31:44', '15.00'),
    ('A', '2013-12-13 13:32:44', '16.00'),
    ('A', '2013-12-13 13:33:44', '16.00'),
    ('A', '2013-12-13 13:34:44', NULL),
    ('A', '2013-12-13 13:35:44', NULL),
    ('A', '2013-12-13 13:36:44', '15.00'),
    ('B', '2013-12-21 05:29:34', '3.00'),
    ('B', '2013-12-21 05:30:34', NULL),
    ('B', '2013-12-21 05:31:34', NULL),
    ('B', '2013-12-21 05:32:34', NULL),
    ('B', '2013-12-21 05:33:34', '4.00'),
    ('B', '2013-12-21 05:34:34', NULL),
    ('B', '2013-12-21 05:35:34', NULL),
    ('B', '2013-12-21 05:36:34', '5.00');

Solution step 1:

-- Find start and end dates
SELECT
    T1.[Target],
    Calc.StartOrEnd,
    Calc.PollTime,
    GroupID = (ROW_NUMBER() OVER (ORDER BY T1.[Target], T1.PollTime) + 1) / 2
INTO #PartialResult
FROM dbo.Table1 AS T1
CROSS APPLY
(
    -- Classify the outer NULL row as start, end or neither
    SELECT TOP (1)
        StartOrEnd.StartOrEnd, 
        StartOrEnd.PollTime 
    FROM 
    (
        -- Start date test
        SELECT
            StartDate.StartOrEnd,
            StartDate.PollTime,
            StartDate.Value
        FROM 
        (
            SELECT TOP (1)
                -- Looking for a Start date (S)
                StartOrEnd = CONVERT(char(1), 'S'), 
                T1.PollTime,
                T2.Value
            FROM dbo.Table1 AS T2 
            WHERE 
                -- Find the previous row
                T2.[Target] = T1.[Target]
                AND T2.PollTime < T1.PollTime
            ORDER BY
                T2.PollTime DESC
        ) AS StartDate
        WHERE
            -- Is a Start date if the value isn't NULL
            StartDate.Value IS NOT NULL

        UNION ALL

        -- End date test
        SELECT
            EndDate.StartOrEnd,
            EndDate.PollTime,
            EndDate.Value 
        FROM 
        (
            SELECT TOP (1)
                -- Looking for an End date (E)
                StartOrEnd = CONVERT(char(1), 'E'), 
                T2.PollTime,
                T2.Value
            FROM dbo.Table1 AS T2 
            WHERE 
                -- Find the following row
                T2.[Target] = T1.[Target]
                AND T2.PollTime > T1.PollTime
            ORDER BY
                T2.PollTime ASC
        ) AS EndDate
        WHERE
            -- Is an end date if the value isn't NULL
            EndDate.Value IS NOT NULL
    ) AS StartOrEnd
) AS Calc
WHERE 
    -- Outer row value is NULL
    T1.value IS NULL;

Step 1 Query Plan

Solution step 2:

CREATE UNIQUE CLUSTERED INDEX cuq
ON #PartialResult (GroupID, [Target], StartOrEnd);

-- Special case: final end row
-- if the last row in the table is NULL
INSERT #PartialResult
(
    [Target],
    StartOrEnd,
    PollTime,
    GroupID
)
SELECT
    FinalRow.[Target],
    StartOrEnd = CONVERT(char(1), 'E'),
    FinalRow.PollTime,
    GroupID = LastGroup.GroupID
FROM 
(
    SELECT TOP (1)
        T1.[Target],
        T1.PollTime,
        T1.Value
    FROM dbo.Table1 AS T1
    ORDER BY
        T1.[Target] DESC,
        T1.PollTime DESC
) AS FinalRow
CROSS APPLY
(
    SELECT TOP (1) GroupID
    FROM #PartialResult
    ORDER BY GroupID DESC
)  AS LastGroup
WHERE
    FinalRow.Value IS NULL;

Step 2 Query Plan

Solution step 3:

-- Final result
SELECT
    Pivoted.[Target],
    StartDate = Pivoted.S,
    EndDate = Pivoted.E,
    Duration = DATEDIFF(MINUTE, Pivoted.S, Pivoted.E)
FROM #PartialResult
PIVOT 
(
    MAX(PollTime) 
    FOR StartOrEnd IN (S, E)
) AS Pivoted
ORDER BY
    Pivoted.GroupID;

Step 3 Query Plan

-- Tidy up
DROP TABLE 
    #PartialResult, 
    dbo.Table1;

Results:

╔════════╦═════════════════════════╦═════════════════════════╦══════════╗
║ Target ║        StartDate        ║         EndDate         ║ Duration ║
╠════════╬═════════════════════════╬═════════════════════════╬══════════╣
║ A      ║ 2013-12-13 13:34:44.000 ║ 2013-12-13 13:36:44.000 ║        2 ║
║ B      ║ 2013-12-21 05:30:34.000 ║ 2013-12-21 05:33:34.000 ║        3 ║
║ B      ║ 2013-12-21 05:34:34.000 ║ 2013-12-21 05:36:34.000 ║        2 ║
╚════════╩═════════════════════════╩═════════════════════════╩══════════╝
Paul White
  • 94,921
  • 30
  • 437
  • 687
2

I see answers with CTE, but since I do not really understand those...

select s.[Target], s.PollTime [StartDate], min(e.PollTime) [EndDate], datediff(minute,s.PollTime,min(e.PollTime)) [Duration(min)]
from 
    (select *, lag(value,1) over (partition by [Target] order by PollTime) [lag_Value]
    from #temp) s
join
    (select *, lag(value,1) over (partition by [Target] order by PollTime) [lag_Value]
    from #temp) e on s.[Target] = e.[Target] and s.PollTime < e.PollTime
where s.Value is null
and s.lag_Value is not null
and e.Value is not null
and e.lag_Value is null
group by s.[Target], s.PollTime

Just replace #temp with whatever your table name is.

DavidN
  • 591
  • 4
  • 9