6

I have a large SQL Azure database (P6 nearing 1TB in size). I want to do a cleanup/removal of any unused indexes. For the past 30 days, we've captured the 2 following sets of information.

See: https://gist.github.com/eoincampbell/3fe775d43e86ad342f9c6eba10f350f9

  • Index Stats gathered from sys.dm_db_index_physical_stats joined to sys.tables, sys.schemas and sys.indexes
  • Index Usage gathered from sys.dm_db_index_usage_stats

I have a small concern around the accuracy of sys.dm_db_index_usage_stats. It's not clear from the documentation when/if the following occurs in a SQL Azure environment (as compared to a single instance MSSQLServer.)

https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-usage-stats-transact-sql?view=azuresqldb-current The counters are initialized to empty whenever the SQL Server (MSSQLSERVER) service is started. In addition, whenever a database is detached or is shut down (for example, because AUTO_CLOSE is set to ON), all rows associated with the database are removed.

Here's the query I'm using to subsequently identify the unused indexes. It

  1. Gets the latest index information for all indexes in the database (812 records)
  2. Gets the latest usage information for all indexes (558 records)
  3. LEFT OUTER JOINs them together
  4. Excludes any Clustered/PK Indexes
  5. Returns anything with no usage stats OR anything with zero user-read stats.

The total returned row count is ~219 rows

Does this approach seem valid?

Query


WITH MostRecentStats (
    SchemaName, TableName, IndexName, IndexType, AllocUnitType, Pages, MostRecentAt
)
AS (
    SELECT      SchemaName, TableName, IndexName
                , IndexTypeDescription, AllocUnitTypeDescription
                , Max(PageCount) , Max(RecordDate)
    FROM        DBStats.IndexStats
    GROUP BY    SchemaName, TableName, IndexName
                , IndexTypeDescription, AllocUnitTypeDescription
    -- ****** Returns 812 Indexes across all tables ******
)
, AllCombinedUsage (
    SchemaName, TableName, IndexName
    ,user_seeks, user_scans, user_lookups, user_updates 
    , system_seeks, system_scans, system_lookups, system_updates
)
AS (
    SELECT      SchemaName, TableName, IndexName
                , sum(user_seeks), sum(user_scans), sum(user_lookups), sum(user_updates)    
                , sum(system_seeks), sum(system_scans), sum(system_lookups), sum(system_updates)
    FROM        DBStats.IndexUsage
    GROUP BY    SchemaName, TableName, IndexName
    -- Only Returns 558 Index with Usage Statistics... 
)
SELECT      a.SchemaName, a.TableName, a.IndexName, a.Pages
            , b.*
FROM        MostRecentStats a
LEFT JOIN   AllCombinedUsage b
            ON a.SchemaName = b.SchemaName 
            AND a.TableName = b.TableName
            AND a.IndexName = b.IndexName
WHERE       a.IndexName NOT LIKE 'PK_%' --Filter out all Primary Keys
AND         a.IndexType <> 'CLUSTERED INDEX' --And Clusted Indexes
AND (
            b.IndexName IS NULL --Include everything that has no index usage data
            OR
            (b.user_seeks + b.user_scans + b.user_lookups) = 0
            --Include everything with 0 User Reads on the data
)
ORDER BY    a.Pages DESC
Eoin Campbell
  • 201
  • 1
  • 4

2 Answers2

3

The problem with Azure SQL Database is that you don't have control over when the SQL Server is re-started or your database shifts over to a different instance - i.e., the index usage collection might be from a much shorter timespan than would be usable for you.

One option is to work with the auto-index management functionality built in to Azure SQL Database.

Another option would be for you to materialize the info from sys.dm_dm_index_usage_stats into your own table so that the info survives these things. I haven't looked for such a script, but there might exists something "out there" already.

One factor that is interesting is when the SQL Server was re-started:

SELECT sqlserver_start_time  
FROM sys.dm_os_sys_info;  

I can't say for sure if this includes all cases where usage stats would be wiped, though.

Below are a couple of articles on the subject that adds some information: https://www.c-sharpcorner.com/article/dropping-unused-indexes-on-an-azure-sql-database/ https://sqlperformance.com/2018/04/azure/automatic-index-management-in-azure-sql-database

Tibor Karaszi
  • 18,269
  • 2
  • 17
  • 30
2

You may also need to exclude unique indexes