Large archive table with "unimportant" data, how to improve/redesign?

Question

We have a SQL DB that collects analog and digital sensor data from several hundred devices. The archive table is quite simple, just a (datetime) timestamp, (int) stationId, (int) datatapointId and (double) value (plus 3 bool flags).

The DB collects about 12 million entries per month. This table is supposed to be always growing, no rollbacks, updates or deletions are neccessary.

The main purpose of this table is to be used to show archive values in our web visualization ordered by timestamp and filtered by this or that id, therefore I created the relevant indexes for the fields I need. It works, but the performance is quite poor (about 5 seconds when filtering by a start/end time, stationId and datapointId).

What would be a better way to store and query this kind of data? Is SQL Server Standard 2016 perhaps not the right DBMS for such an archive? The original decision to use SQL Server dates back several years when the amount of data was way smaller than today. Is there some magic in SQL Server that I can apply to trade reliability for performance?

score 1 · Answer 1 · answered Oct 29 '19 at 11:16

It would be good to know where that 5 seconds goes. Is it disk, network, memory or CPU?

Guessing it's disk there are ways to speed this up. One is to get faster disk: flash or NVMe would be nice! Another, more plausible solution, is to compress the data so less IO is required for the same information content. SQL Server supports several options. Also look into clustered columnstore indexes. These are specifically designed to efficiently store and process large amounts of data.

Looking outside SQL Server it sounds like you have time-series data. There are many DBMS specifically tailored to this use-case.

Before you go that far you may look into refactoring your application architecture. Since the data is write-once you can pre-process, or at least cache, any query against it. What visualisations do you use? Often data is aggregated so, say, min, max and average per hour/ day/ minute are shown. If this is your case you can pre-calculate these aggregates, store them and remove the detailed data. If you use n-tiles, candlesticks or line charts, calculate the corresponding SVG, PNG or JPEG once for each interval, and stitch them together to make larger charts. A lot of this store-and-stitch can be done in the file system or CDN avoiding load on the DB server at all.

Large archive table with "unimportant" data, how to improve/redesign?

1 Answers1