7

From time to time, consumers of my database processes will ask for an estimate of when a given task will be done. While I feel like I know how to read an EXPLAIN in most database engines, I have trouble trying to translate this to "ask me again in 15 minutes". Does anyone know a good "rule of thumb" to use for any particular database?

I realize this isn't going to be a hard and fast rule, but even being able to give a ballpark figure could be useful in some instances.

TML
  • 1,374
  • 13
  • 21

2 Answers2

6

I suspect that this is not possible the way you imagine it. One important reason is that the actual run time is very hardware dependent, and many of the optimization decisions that the database engine does are effectively about balancing the use of the different hardware components (e.g., disk, memory, CPU).

I suggest that you run a bunch of queries relevant to your application, record the cost estimations and the run times, and try to make sense of that data. You might get a nice linear relationship, or you might learn that the cost numbers are worthless for this purpose.

Peter Eisentraut
  • 10,723
  • 1
  • 35
  • 35
6

Cost based optimizers all work via a variety of proprietary algorithms (or you can read them for open source databases), but they typically work by assigning a reference operation a value of 1. For example, in SQL Server an operation with a cost estimate of 1 takes 1/320th of a second on a reference computer under some developer's desk in Redmond. The costing is just a relative guess of how expensive a query will be. Many RDBMSes use this cost in establishing priority or, in the case of deadlocks, to kill off cheaper queries (they take less time to run again). But it's all just a guess based on the information that the query optimizer has at its disposal at the time the query is being run.

Peter is correct, the best you can hope for is running some benchmark queries in ideal scenarios and using those to base best guesses on. You have to deal with a lot of different points of contention in an RDBMS, so it's difficult to specifically determine how any given query will perform in the real world.