11

Some literature on SQL Server data compression I read state that the write cost increases to about four times what would normally be required. It also seems to imply that this is the primary downside to data compression, strongly implying that for a read-only archive database, the performance will (with few excep tions) be improved by the usage of data compression of 100% filled pages.

  1. Are the statements above true?
  2. What are the primary "variations" between data compression and otherwise (for reading)
  • "CPU +x%"?
  • "IO -y%"?
  • page split occurence?
  • tempdb usage?
  • RAM usage?
  1. And for writing?

For the purpose of this question, you can limit the context to PAGE-level compression of a big (> 1TB) database, but additional comments are always welcome.


References:

SQL Server Storage Engine Blog (The DW scenario shows compression to be very advantageous)
Data Compression: Strategy, Capacity Planning and Best Practices

A more detailed approach to deciding what to compress involves analyzing the workload characteristics for each table and index. It is based on the following two metrics:

U: The percentage of update operations on a specific table, index, or partition, relative to total operations on that object. The lower the value of U (that is, the table, index, or partition is infrequently updated), the better candidate it is for page compression.
S: The percentage of scan operations on a table, index, or partition, relative to total operations on that object. The higher the value of S (that is, the table, index, or partition is mostly scanned), the better candidate it is for page compression.

Both of the above are demonstrably biased towards recommending page compression for DW-style databases (read-intensive/exclusive, big-data operations).

Glorfindel
  • 2,205
  • 5
  • 19
  • 26
孔夫子
  • 4,330
  • 3
  • 30
  • 50

3 Answers3

8

Just my 2cents from my own experiments on 1-2 year old hardware:

Read-only operations (DW-style scans, sorts etc) on page-compressed tables (~80rows/page) I've found to break-even at compression size reduction of ~ 3x.

I.e. if the tables fit into memory anyway, page compression only benefits performance if the data size has shrunk by over 3x. You scan fewer pages in memory, but it takes longer to scan each page.

I guess your mileage may vary if your plans are nested-loop and seek-heavy. Among others, this would also be hardware-dependent (foreign NUMA node access penalties, memory speed etc).

The above is just a rough rule-of-thumb that I follow, based on my own test runs using my own queries on my own hardware (Dell Poweredge 910 and younger). It is not gospel eh!

Edit: Yesterday the excellent SQLBits XI presentation of Thomas Kejser was made available as a video. Quite relevant to this discussion, it shows the 'ugly' face of CPU cost for page compression - updates slowed down by 4x, locks held for quite a bit longer.

However, Thomas is using FusionIO storage and he picked a table that is only 'just' eligible for page compression. If storage was on a typical SAN and the data used compressed 3x-4x then the picture might have been less dramatic.

John Alan
  • 1,081
  • 7
  • 13
1

I can add few words from my Data Warehouse environment.

Implementing compression (PAGE in my case) on a test table with 30 milion of rows (18GB) reduce the size of the table from 18GB to 3GB! (storage efficiency for sure) but increase the load time (write) from 22 to 36 minutes.

So for read or read and place the data in memory it could be a good solution but for daily data load it could cause performance downgrade.

0

Glenn Berry shared a really useful script for understanding the potential savings of using data compression https://www.youtube.com/watch?v=P6P9Jh4ihK0

FrugalShaun
  • 441
  • 3
  • 11