-1

I have a table where rows consist of a few GUID ids and then a huge byte array of data. There is a partitioned clustered index on this table, which orders data by the GUIDs.

The table is large, containing ~3tb of data, most of which is the byte array of data, which is not a part of the index. When first creating the index, it took well over a day, but the index rebuild (not reorganize) took under an hour. Why did the rebuild go so much quicker? My understanding is it would have to drop and re-create the clustered index, which would involve it re-writing this large table all over again.

It's might be worth noting that when lookin at size reports for the table, the table itself takes ~3tb but the index is 1gb or less. Also worth noting, the table wasn't partitioned until that initial index create I did previously, while the rebuild was done on an already partitioned table. Not sure if that makes an impact.

Again, just trying to gain understanding. Why would my rebuild be so significantly faster than the initial create?

EDIT: It is the clustered index I initially created and was re-building. There are no other indexes on this table

frobot
  • 19
  • 2

1 Answers1

1

Whether you're creating a clustered or a nonclustered index, the operator which does the lion's share of the work appears to be the sort. But when you rebuild the index -- clustered or nonclustered, online or offline -- SQL Server appears to scans the existing index first, which is already sorted.

E.g.

Let's create a nonclustered index on the StackOverflow Users table (after setting IO statistics on):

USE [StackOverflow] ;
GO
SET STATISTICS IO ON ;
GO
CREATE NONCLUSTERED INDEX [IX_Users_Reputation_DESC_Location_DisplayName]
    ON dbo.Users
    ( [Reputation] DESC, [Location] ASC, [DisplayName] ASC ) ;
GO

This operation reads 141572 pages:

enter image description here

And the execution plan looks like this:

enter image description here

PasteThePlan: https://www.brentozar.com/pastetheplan/?id=Syzxzzufo

Note the Clustered Index Scan (i.e. scanning the whole table) and the Sort operators.

Now let's rebuild the index offline:

USE [StackOverflow] ;
GO
SET STATISTICS IO ON ;
GO
ALTER INDEX [IX_Users_Reputation_DESC_Location_DisplayName]
    ON dbo.Users
    REBUILD 
    WITH ( SORT_IN_TEMPDB = OFF, ONLINE = OFF ) ;
GO

This operation only reads 53360 pages -- nearly a third of the pages read when creating the the index:

enter image description here

With this execution plan looking like:

enter image description here

PasteThePlan: https://www.brentozar.com/pastetheplan/?id=ryg2GfdMi

Note that this time, SQL Server is only scanning the existing nonclustered index, and not the clustered one again. Plus now now we have no Sort operator chewing up CPU time.

The same is true for rebuilding of a clustered index. But while this operation scans the existing clustered index first, so the page reads are on parr with creating the clustered index, it still doesn't need to re-sort the data.

So it looks like an index rebuild is actually doing less work than an index create.

MattM
  • 146
  • 5