4

What tools or best practices exist to demonstrate a new server instance is "as capable" as an older one?

We're moving from an EC2 hosted instance of MS SQL Server to an RDS instance, and have been tasked with "proving" that the new platform won't cause any regressions.

The naive idea is to somehow replicate a day's worth of load, and record the elapsed time for all the queries. Then, if the worst 0.1% of the queries run no slower, we're good.

Is that logic good, or is there a flaw, or is there a better/standard way?

What would I need to do on the old server to "capture" a day's worth of load? (Queries, parameter values, session id so that I can simulate the correct number of concurrent connections, login name, elapsed query time, etc).

What else am I missing?

(Another niave idea was to put a "load balancer" on to "replicate" the queries to both instances, and log the query times. Is that even possible?)

Iridescent
  • 41
  • 4

1 Answers1

4

This is 100% a valid approach. However, sadly, there's no really simple way to make it happen.

To start with, you're going to need to capture the query metrics for your existing servers. The two best mechanisms are either Query Store (enabled by database, so it's not a full server measure, ever) or Extended Events. Extended Events gets detailed metrics and you can measure everything happening on a server (just be prepared, that's a lot of data).

The next step is to figure out a way to replay the events. Microsoft had a tool, Distributed Replay. Sadly, it never got much support and is now on the deprecated list. You can still make it work, but it is a true pain in the bottom.

To get around this, I'd suggest using a combination of DBATools (excellent support for Extended Events) and Gianluca Sartori's WorkloadTools. It's going to be a bunch of work to set this up, but between these two tool sets, you can get it done.

Grant Fritchey
  • 4,615
  • 20
  • 19