Azure SQL index rebuild after huge size reduction

Question

We have a table in an Azure SQL database that used to have an nvarchar(max) column storing pdf files. (The wonders of external developers.) The table grew to 156 GB. It has 476000 rows. After changing the logic, we no longer needed the pdf column. The only reasonable way to get rid of the data in it was to drop the column and recreate the column (in case some weird process was still referencing it).

However, the table size is still reported as 156 GB. The backup table I just created (SELECT INTO) is 128 MB, so that seems to be the real size of the data.

I let an index rebuild (ONLINE) run overnight on the clustered PK index. It failed with a TCP error sometime between 8 and 12 hours. The index is still 95% fragmented, the size is still reported as 156 GB.

Is there an explanation why this is so eye-wateringly slow? Is there a better way? Production database, table is used by a website, has to be accessible, so can't do it OFFLINE unless it takes less than 10 minutes - which nobody can guarantee.

Can I just build all the indexes on the backup table, drop the original table and rename the backup? That sounds risky (small risk of losing a record created just the wrong time).

I'm trying to make Azure realize it is no longer used. Allocated, I'm OK with that. Used, not so much:

The table in question:

Again, it's not the reserved space that is the issue, it is the used space.

J.D. · Accepted Answer · 2024-01-13T15:19:19.600

However, the table size is still reported as 156 GB.

Yes, this is expected.

Why?

The way space allocation works in SQL Server is as the data grows, space is allocated off disk to the internal file of that database within the SQL Server instance, in chunks that are the size defined in the database properties, under the File -> Autogrowth settings:

This reserves the data off disk proactively and ensures it's available as more data is added to that database in the future. File growth operations are fairly heavy operations, which is why it the database file doesn't just grow at the same rate and amount as the data growth itself.

This is true for file shrinking too. It is an expensive operation. The database is designed to not auto-shrink by default, when data is removed. The assumption is that space that was already claimed from disk will eventually be re-used. So instead of releasing it back to disk when data is removed, it is just marked internally to be re-used for future data to overwrite it. This may sound wasteful, but disk is cheap, and performance is not, which is what's implicitly gained by not continually growing and shrinking every time the data changes.

A useful tool is the system stored procedure sp_spaceused. If you run it within the context of a particular database, without any parameters, it'll tell you the total size of that database - database_size in the first result set, and how that disk space is currently distributed within the database file - reserved vs data vs index_size in the second result set. Previously you probably had around 156 GB of space in the data column which would now be showing in reserved and means it's ready to be consumed by future data growth.

Now, to solve your problem

SQL Server has a shrink command (which can also be executed via the SSMS GUI) which will release the reserved space from the database file back to disk (as much as it can, there's some if, ands, or buts to that). It's usually recommended against because again, shrinking is a heavy operation against the disk, and can impact concurrent database performance until it's done, and because the space being released will have to go through a heavy growth operation again once more data is added to the database. But in special one-off large data changes like your case, it's probably sensible to do, if you really need to.

Sidenote

Index rebuilds are usually a waste and a wasteful operation - causes file growth, which basically undoes some of the space released from shrinking. It's a vicious cycle. If you do them in hopes of performance gains, it's unlikely you're actually gaining anything from the rebuilds themselves (index fragmentation really doesn't matter anymore on modern hardware). The rebuilds trigger subsequent things that may be improving your performance such as clearing the plan cache and updating statistics. But you can run these things individually, more granularly and efficiently, without the need of the wasteful index rebuilds.

score 1 · Answer 2 · answered Jan 16 '24 at 23:29

Index rebuilds don't touch LOB columns, so it's not going to reclaim any space.

Instead you need to use REORGANIZE

ALTER INDEX YourIndex ON YourTable REORGANIZE WITH (LOB_COMPACTION = ON);

However this can itself not always do the full job.

Your best bet really is to insert all the data into a new table and switch it in.

Create a table with the exact same primary and unique keys, indexes and constraints.
Insert the data from the old table.
- Use a trigger to keep the new table up to date if you want to keep the old table live at the same time.
Then do an ALTER TABLE ... SWITCH... as outlined in various places.

SET XACT_ABORT, NOCOUNT ON
BEGIN TRANSACTION;
TRUNCATE TABLE dbo.OldTable
ALTER TABLE dbo.NewTable
SWITCH TO dbo.OldTable;
COMMIT TRANSACTION;

score 0 · Answer 3 · answered Jan 17 '24 at 13:16

I ended up letting the DBCC Cleantable run, it took about 30 hours, but when finished, the table size was 200 MB (down from the initial 156 GB). It kept constantly shrinking, sometimes a bit growing back, then shrinking again. Database size closely followed. After this, the ALTER INDEX REBUILD took about 2 seconds.

Azure SQL index rebuild after huge size reduction

3 Answers3

Why?

Now, to solve your problem

Sidenote