Preventative measures to avoid wrong cardinality estimate statistics

Question

This is a follow-up to Queries randomly getting slow, good preventive practices?, attempting to ask a more specific question about one of the ideas stated there.

A common cause of queries suddenly becoming slow after a long history of being fast is outdated cardinality estimates. SQL Server takes advantage of cardinality estimates to figure out a query plan that will be the fastest to answer a given query. If a query that usually takes few minutes suddenly bursts to tens of hours because of outdated cardinality estimates, it might negatively affect business uses before support staff can respond.

Are there any measures that could be applied preemptively to find and fix outdated cardinality estimates before a query is executed? Are there any specific tools that could be periodically executed to go over estimates and fix them, or maybe some metrics to be observed that would indicate how good are the estimates in a database?

score 8 · Answer 1 · answered Dec 21 '20 at 15:52

The first line of defense is a nightly statistics gathering process. Ola Hallengren's rather ubiquitous set of maintenance scripts, for example, have a number of options for index and statistics maintenance. That should keep your statistics from getting too out of date.

The next line of defense are the database settings to enable the automatic updating of statistics, to enable statistics to be gathered asynchronously, and to enable the automatic creation of statistics

alter database [your database]
  set auto_create_statistics on;
alter database [your database]
  set auto_update_statistics on;
alter database [your database]
  set auto_update_statistics_async on;

If auto_update_statistics is enabled, SQL Server will re-gather statistics that it deems stale (the threshold differs by version) when it encounters them. If you allow that to be done asynchronously, the query that first encounters the stale statistics will use the old stats while a background thread updates the stats for a future execution. In any sort of OLTP system, I'd much prefer that to the synchronous approach where the query that encounters stale stats waits to execute until fresh stats are available. In a data warehouse system, I'd be a lot happier for the query to wait for fresh stats.

There are downsides to allowing auto stats updates, though. When you refresh stats, you're going to force SQL Server to purge all the plans that use those stats from the cache and to recompile them which can be expensive. You're also introducing additional background stats gathering processes that will consume CPU during the day.

A third line of defense involves a DBA or developer being thoughtful about updating statistics as part of a load or based on information they know about the system in question. If you're writing a warehouse load, for example, it probably makes sense to update statistics as a final step when you know that you've made a large number of changes to a table (maybe you do this once a week rather than every day depending on the windows that you have). If you're a DBA running an issue-tracking system and you know a bunch of queries are highly dependent on a particular statistic on a filtered index for all the open issues and you know that there are generally very few open issues overnight when statistics run but the number grows steadily to a peak around 2pm, and then slowly declines, you might schedule a job to update that particular filtered index statistic hourly.

LowlyDBA - John M · Answer 2 · 2020-12-21T20:31:53.053

The SQL Server documentation has a decent section dedicated to Cardinality Estimation. Of note, it covers areas where SQL Server cannot make good cardinality estimates without using alternative methodologies that would make for good preventative measures (emphasis mine):

In the following cases, SQL Server cannot accurately calculate cardinalities. This causes inaccurate cost calculations that may cause suboptimal query plans. Avoiding these constructs in queries may improve query performance. Sometimes, alternative query formulations or other measures are possible and these are pointed out:

Queries with predicates that use comparison operators between different columns of the same table.

Queries with predicates that use operators, and any one of the following are true:

There are no statistics on the columns involved on either side of the operators.

The distribution of values in the statistics is not uniform, but the query seeks a highly selective value set. This situation can be especially true if the operator is anything other than the equality (=) operator.

The predicate uses the not equal to (!=) comparison operator or the NOT logical operator.

Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument is not a constant value.

Queries that involve joining columns through arithmetic or string concatenation operators.

Queries that compare variables whose values are not known when the query is compiled and optimized.

It also goes into some detail about CE improvements over time, essentially boiling down to the fact that newer versions are (supposedly) always better at CE estimation. So making sure you're on a relatively current SQL Server version and possibly turning on optimization hotfixes is also important.

Preventative measures to avoid wrong cardinality estimate statistics

2 Answers2