TL;DR: Is it possible to find out which partition(s) have been resampled and which have not when incremental statistics are used? Platform is SQL Server 2014 Enterprise edition.
The long version with some background information is like so.
Assuming a fairly typical DW environment, there is a partitioned table. The partitioning is based on a date column. This is done as staging data is loaded on separate table and after pre-processing, partition switching is used to move data into the production fact table. Oh, and a clustered columnstore index is in use. There are about a thousand partitions used. The DB is running on a virtual machine.
There are some 7.5 gigarows (100 GB) in the fact table. Daily growth is about five megarows. This is way too small a growth rate to trigger automatic statistics update, save trace flag 2371 (which haven't been tried).
The knee-jerk developer reaction to outdated statistics was updating them. For 7.5 gigarows, full update for all the stats takes some five hours. For a single statistics update, the processing performance is around 20 minutes or 90 megarows per second.
As the system is on a VM platform, business rules limit its costs. Neither memory nor IOPS are easily increased. Five hours update job is way too slow to be included on nightly ETL process, so either the stats stay outdated, are updated on unexpected time or will be updated in a maintenance window.
As the SQL Server is version 2014 Enterprise edition, it supports incremental statistics which sound just like the solution. After converting the stats to incremental ones, processing a single stat for single partition takes only 20 seconds. Grand total for newly-switched partition is about five minutes. This sounds great and certainly fits in the ETL process.
The thing that I wonder is how to manage incremental stats in partition switched environment. Assuming the stats are converted and updated as incremental on date D, how does one find out unprocessed partitions on, say, date D+2? Updating stats in ETL process is trivial, as the switching process obviously is aware of the partition id. But if there are partitions that are not resampled, how does one find those?
- Last update for stats can be found from
sys.dm_db_stats_properties - Partition numbers are available from
sys.partitions - Partitioning function values are in
sys.partition_range_values
One could pick last update date L for a stat and compare it to today's date T. Then calculate which partition id L points and wheter it is the same as T's. Then proceed with update for all the partition ids [L, T). This sounds tricky and error prone, so is there better a way? A DMV that shows which partitions are used for resampling would be nice, but there isn't one, is there?