2

I have a requirement to report monthly metrics on SQL Server uptime, by application/database. This would be at the cluster level. I.E., if a secondary replica went offline, but a primary was still available to process transactions, uptime would still be considered 100%.

I've looked at various tools that might serve this purpose, and a few come close. But, none of them seem to capture anything more than if the SQL Server service is online and accepting connections. They also fail to aggregate these metrics at the cluster/AG level. Meaning these uptime reports would take a hit if a secondary replica were to go offline.

For example, let's say a database goes offline or a log file fills up, and transactions are unable to process against a single database. Those tools would say that SQL Server is up, but I would still have people saying it was a database issue. Thus, these metrics would need to reflect that SQL was not fully up at that time.

The best idea of I'm come up with at this point is to create a SQL Agent job that inserts a record into a Canary table in each database, once a minute. Then at the end of the month, to query that table and divide the previous month's row count by the expected row count. I figured there was no better way to prove a database was actually available than to try to insert a row.

I already have the above solution developed, tested and working. But I'm curious if anyone knows of a better way to do this. Including any goods tools or DMVs I may have overlooked, that I could use to extrapolate end-user availability metrics of all databases on a SQL Instance?

Brendan McCaffrey
  • 3,444
  • 2
  • 8
  • 29

1 Answers1

2

This is the solution I'm using, which is working.

The below stored procedure can be used to populate a Canary table, and can be executed each minute from a SQL Agent job.

CREATE PROCEDURE [dbo].[SQLUptime]
AS
BEGIN 
    SET NOCOUNT ON
DECLARE
    @DatabaseName nvarchar(128),
    @Query nvarchar(max);

DECLARE cDatabases CURSOR FOR
    SELECT [name]
    FROM sys.databases
    WHERE database_id > 4

OPEN cDatabases;
FETCH NEXT FROM cDatabases INTO @DatabaseName

WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY
        IF COALESCE(sys.fn_hadr_is_primary_replica(@DatabaseName),1) = 1    
        BEGIN
            SET @Query = 'USE [' + @DatabaseName + ']' + CHAR(13)+CHAR(10) + CHAR(13)+CHAR(10) + 
                'IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = N''dba'')' + CHAR(13)+CHAR(10) + 
                'BEGIN' + CHAR(13)+CHAR(10) +
                '    EXECUTE sp_executesql N''CREATE SCHEMA [dba]''' + CHAR(13)+CHAR(10) +
                'END' +  CHAR(13)+CHAR(10) + CHAR(13)+CHAR(10) +

                'IF NOT EXISTS (SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N''[dba].[Canary]'') AND type in (N''U''))' + CHAR(13)+CHAR(10) +
                'BEGIN' + CHAR(13)+CHAR(10) +
                '    CREATE TABLE dba.Canary (' + CHAR(13)+CHAR(10) +
                '        [Time] Datetime NOT NULL' + CHAR(13)+CHAR(10) +
                '    )' + CHAR(13)+CHAR(10) + CHAR(13)+CHAR(10) +

                '    CREATE CLUSTERED INDEX CIX_Canary_Time ON dba.Canary ([Time]);' + CHAR(13)+CHAR(10) +
                'END' + CHAR(13)+CHAR(10) + CHAR(13)+CHAR(10) +

                'INSERT INTO dba.Canary ([Time])' + CHAR(13)+CHAR(10) +
                'VALUES (getdate());' + CHAR(13)+CHAR(10) + CHAR(13)+CHAR(10) +

                'DELETE FROM dba.Canary' + CHAR(13)+CHAR(10) +
                'WHERE [Time] < DATEADD(MONTH, -12, GETDATE());'

        /*PRINT @Query*/
        EXECUTE sp_executesql @Query
    END
    END TRY
    BEGIN CATCH
        SELECT
            ERROR_NUMBER(),
            ERROR_STATE(),
            ERROR_SEVERITY(),
            ERROR_PROCEDURE(),
            ERROR_LINE(),
            ERROR_MESSAGE()
    END CATCH

    FETCH NEXT FROM cDatabases INTO @DatabaseName
END

CLOSE cDatabases;
DEALLOCATE cDatabases;

END

The following query can then be used each month to report uptime based the number of rows successfully inserted into the Canary table during the previous month.

DECLARE
    @StartDate    datetime,
    @CutoffDate   datetime,
    @TotalMinutes int,
    @output       int,
    @SQL          nvarchar(500),
    @Parameters   nvarchar(500),
    @Database     nvarchar(128);

SET @StartDate = DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE())-1,0); SET @CutoffDate = DATEADD(DAY,0,DATEADD(MONTH,1,@StartDate));

IF @CutoffDate > GETDATE() SET @CutoffDate = GETDATE();

SET @TotalMinutes = DATEDIFF(MINUTE, @StartDate,@CutoffDate); SET @Parameters = N'@retvalOUT int OUTPUT';

IF OBJECT_ID('tempdb..#UptimeMinutes') IS NOT NULL DROP TABLE #UptimeMinutes

CREATE TABLE #UptimeMinutes ( DatabaseName nvarchar(128), UptimeMinutes int )

DECLARE cursorDatabases CURSOR FOR SELECT [name] FROM sys.databases WHERE database_id > 4;

OPEN cursorDatabases; FETCH NEXT FROM cursorDatabases INTO @Database;

WHILE @@FETCH_STATUS = 0 BEGIN SET @SQL = N'USE [' + @Database + '] SELECT @retvalOUT = COUNT(*) FROM dba.Canary';

EXEC sp_executesql @SQL, @Parameters, @retvalOUT=@output OUTPUT;

INSERT INTO #UptimeMinutes (DatabaseName,UptimeMinutes)
SELECT @Database, @output;

FETCH NEXT FROM cursorDatabases INTO @Database;

END

SELECT SUM(UptimeMinutes) InstanceUptimeMinutes, SUM(@TotalMinutes) AS TargetInstanceUptimeMinutes, CONVERT(DECIMAL(10,3),(SUM(UptimeMinutes)/CONVERT(DECIMAL,SUM(@TotalMinutes))) * 100) AS InstanceUptimePercentage FROM #UptimeMinutes;

SELECT DatabaseName, DatabaseUptimeMinutes, @TotalMinutes AS TargetDatabaseUptimeMinutes, CONVERT(DECIMAL(10,3),(UptimeMinutes/CONVERT(DECIMAL,@TotalMinutes)) * 100) AS DatabaseUptimePercentage FROM #UptimeMinutes;

CLOSE cursorDatabases; DEALLOCATE cursorDatabases;

Brendan McCaffrey
  • 3,444
  • 2
  • 8
  • 29