3

Imagine a database where you have an indexed view aggregating data from 2 tables, a dataset table and a datapoints table (master/detail).

When a dataset is first created it has a status of LOAD. The expectation is millions of rows will be added to the DataPoints table so we want to avoid updating the aggregate stats at this stage. When loading is complete the status of the dataset will be changed to READY. At this stage, we want the aggregates in place. There is still potential for a dataset in the READY state to get more Datapoints, but it will be much less common.

The DDL is below:

create table dataset
(
   ID int not null ,
   Name nvarchar(50) not null,
   Status nvarchar(10) not null,
   constraint PK_Dataset primary Key (ID)
);

Create table DataPoints ( id int not null , DataSetID int not null, Amount money not null, constraint PK_DataPoints primary Key (ID) );

CREATE NONCLUSTERED INDEX IX_DataPoints_FK ON DataPoints ( DataSetID ASC );

ALTER TABLE DataPoints WITH CHECK ADD CONSTRAINT [FK_Datapoints_Dataset] FOREIGN KEY([DataSetID]) REFERENCES Dataset (ID);

Create View Totals WITH schemabinding as

Select DataSet.ID, DataSet.Name, Count_Big(*) as DataSetCount, Sum(DataPoints.Amount) as DataSetTotal From dbo.DataSet Inner Join dbo.DataPoints On DataSet.ID = DataPoints.DataSetID where DataSet.Status = 'READY' Group by DataSet.ID, DataSet.Name;

CREATE UNIQUE CLUSTERED INDEX IDX_V1 ON Totals (ID);

Let's populate the Dataset table like so:

INSERT INTO DataSet
    ([ID], [NAME], [STATUS])
VALUES
    (1, 'EAST', 'READY'),
    (2, 'WEST', 'READY'),
    (3, 'SOUTH', 'READY'),
    (4, 'NORTH', 'LOAD');

Also see here http://sqlfiddle.com/#!18/ed676c/3

Let's say we insert rows into DataPoints with a DatasetID of 4.

Will the database engine be smart enough to know it doesn't need to update the indexed view?

Let's say we insert rows into DataPoints with a DatasetID of 3.

Will the database engine be smart enough to just take the existing aggregates and update the totals and count only referencing the new row? or will it need to read the DataPoints table and re-aggregate all rows with DatasetID = 3?

Lets say we run:

UPDATE DataSet Set Status='READY' where DatasetID = 4

Will the database engine do a complete table scan of DataPoints and refresh the aggregates for every dataset? Or is it smart enough to only scan or seek for the rows related to DatasetID 4?

Paul White
  • 94,921
  • 30
  • 437
  • 687
Aheho
  • 163
  • 5

0 Answers0