11

I have written an SSIS package to load test data into an empty database. Some of the tables are very large (~700 million rows). Once the SSIS package has completed are there any commands I should run (as an apprentice DBA!) to maximize the performance of the database?

For instance, I executed EXEC sp_updatestats but it reported that no indexes required updating.

Is there a list of things to do once large amounts of data has been loaded or does SQL Server 2008 just take care of all that for you?

Mark Robinson
  • 213
  • 1
  • 5

2 Answers2

8

If you're loading to an empty database, you could/should take steps to avoid requiring any additional maintenance steps post load. Fragmentation is the enemy, that's what you're trying to avoid.

  • Drop all NC indexes before loading.
  • Post load, add the NC indexes for each table in sequence i.e. don't add an index to TableA, then TableB, then back to TableA. This is has no effect on fragmentation but it can improve the time taken to add the indexes on very large datasets (reduces buffer pool churn).
  • If you can insert data to a table in clustered index order, leave the clustered index in place. If you can't, dump the data into a heap and rebuild in to a clustered index post load.

Rather than copy and paste, I'll point you toward the very comprehensive list of references for ETL optimisation @Marian put together in an answer to a question I asked on BCP. Many of these will be equally applicable to your scenario.

Mark Storey-Smith
  • 31,860
  • 9
  • 90
  • 125
5

It depends on how much of a window you have but generally updating statistics and rebuilding/reorganizing indexes is a good step to take. Other than that there shouldn't be anything else you need to do really.

The statistics tell the query optimizer how many rows are likely affected by an operation and that in turn tells SQL which approach to take to run your queries. Inserting data can skew the distribution and if you haven't inserted enough rows to trigger an update of statistics in auto update mode the manual one should help. If you have the window, I say keep the stats update portion.

Indexes can become fragmented as data is added. Rebuilding or Reorganzing helps reduce this which can lead to improved performance when actually accessing your data.

Mike Walsh
  • 18,278
  • 6
  • 50
  • 74