0

I have a query that is on the opening page of our application that takes about 8-10 seconds to load. I've tried all I can to optimize it and I simply cannot make it go any faster (indexed appropriately, rewriting redundant code, ridding it of view-ception hell it lived in). It's a query on our largest table with a bunch of joins to other tables that have a many to one relationship to the main (many in the smaller relate to one id in the largest) and requires an unavoidable group by largeTable.id that makes it so slow.

Now I do have to make this work faster one way or another. I think this may be an appropriate situation to have a data warehouse (I hope that's the right term) type table. I would insert into this warehouse table the results of the slow query and then query from the warehouse table directly. Then, anytime something in my application that affects a table that affects the data from the slow query/data warehouse table, I would drop the rows in the table and recreate it which would then take slightly longer than the initial query, but only happen a few times a day, whereas the rest of the day I am querying directly from my data warehouse table much faster.

Is this a reasonable solution? Are there any common pitfalls or things I need to watch out for with this methodology, or a better way to do it? After a few days of trying to just optimize the query, I need some work around. I am still relatively new to this and am open to suggestions.

bjk116
  • 225
  • 2
  • 9

1 Answers1

3

Calculating and storing a value is a cache; that's a very common strategy. A data warehouse is a very different beast (usually more than one table!).

The main thing to be concerned about is timeliness. If customers expect the results to be closer to real-time, that's OK for now, as ten seconds once an hour should be fine. Even if it bloats to ten times that length, it should be OK.

If you absolutely must have current data, rather than "up to one hour old," you'll need to know when your source tables have changed. If you already have a DateLastModified column in each of the tables, then you can check to see if there's a more recent values than the DateRecorded you'll be capturing when you save this query. A stored procedure can query these values and refresh the cache if necessary. The problem is that the run time will be slow when any source data has changed, which I suspect will be most of the time. Also, you'll probably need indices on your DateLastModified fields, to quickly identify when the source data hasn't changed.

Alternatively, you could scatter triggers hither and yon, so any time a key table is updated, you mark the cache as dirty so it can be updated. This would be high maintenance and fragile, but if you don't already have DateLastModified fields it might be easier.

If your customers are OK with this report being out of date by up to a few minutes, just run it on a schedule.

Jon of All Trades
  • 5,987
  • 7
  • 48
  • 63