20

I've heard mention of statistics that SQL Server keeps by default. What are they tracking, and how can I use this information to improve my database?

goric
  • 1,646
  • 3
  • 18
  • 25

2 Answers2

23

Statistics simply are a form of dynamic metadata that assists the query optimizer in making better decisions. For example, if there are only a dozen rows in a table, then there's no point going to an index to do a lookup; you will always be better off doing a full table scan. But if that same table grows to a million rows, then you will probably be better off using the index. But if you query that table on a column that only has very few unique values (e.g. it might be a "sex" column containing only "M" or "F"), well actually a FTS might be better that that case because you would need to retrieve the block anyway to build the result set. Now say your table is 99% "M" and only 1% "F", we should FTS in one case or use an index in the other. Same table, same query, potentially four different query plans depending on the contents of the table. These kinds of things are the "statistics" and they are individual to each database - even two databases with identical table and index structure will have different statistics.

In summary, in a modern database engine there are two kinds of query optimization: rewriting the SQL (rule-based optimization, like a compiler rewrites your C to make it more efficient) and choosing the right path the the data (cost-based optimization, like a JIT compiler identifying hotspots at runtime). You only need to worry about this if you spot the query optimizer doing something obviously wrong (e.g. choosing FTS when you know an index would be better).

Gaius
  • 11,238
  • 3
  • 32
  • 64
6

They are used by the query optimiser (whitepaper on MSDN) to track distribution of values in indexes and/or columns.

Your only concern should be to update regularly: just leave the DB engine to do its stuff

gbn
  • 70,237
  • 8
  • 167
  • 244