1

I've a SQl server 2019 on a Windows Server 2019 standard core, I manage two typer of backup, one with mantainance plans and one with Veeam Backup. The mantainance plans it's ok, the backup with Veeam fails every day with error on SQL leaving SQLServerWriter in Failed state with last error Non-retryable error. I've open a ticket on Veeam support and we have eplored vary log and the problem wasn't related to Veeam because even creating backup with vss command there are error. According to the logs, the backup job fails with the error:

[11.09.2023 00:07:12] <01> Error Failed to create snapshot: Backup job failed.  
[11.09.2023 00:07:12] <01> Error Cannot create a shadow copy of the volumes containing writer's data.  
[11.09.2023 00:07:12] <01> Error A VSS critical writer has failed. Writer name: [SqlServerWriter].

In the VSS writers, the error could be observed:

Writer name: 'SqlServerWriter'  
Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}  
Writer Instance Id: {1fb6de2a-c593-45c2-b592-e90ff6aad393}  
State: [8] Failed  
Last error: Non-retryable error 

The following errors can be observed in the Windows Application events:

Log Name: Application  
Source: SQLWRITER  
Date: 11.09.2023 0:07:05  
Event ID: 24583  
Task Category: None  
Level: Error  
Keywords: Classic  
User: N/A  
Computer: SQL03Core  
Description:  
Sqllib error: OLEDB Error encountered calling ICommandText::Execute. hr = 0x80040e14.   SQLSTATE: 42000, Native Error: 3013  
Error state: 1, Severity: 16  
Source: Microsoft SQL Server Native Client 11.0  
Error message: BACKUP DATABASE is terminating abnormally.  
SQLSTATE: 42000, Native Error: 3224  
Error state: 1, Severity: 16  
Source: Microsoft SQL Server Native Client 11.0  
Error message: Cannot create worker thread.

Log Name: Application
Source: SQLWRITER
Date: 11.09.2023 0:07:05
Event ID: 24583
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: SQL03Core
Description:
Sqllib error: OLEDB Error encountered calling ICommandText::Execute. hr = 0x80040e14. SQLSTATE: 42000, Native Error: 3013
Error state: 1, Severity: 16
Source: Microsoft SQL Server Native Client 11.0
Error message: BACKUP DATABASE is terminating abnormally.
SQLSTATE: 42000, Native Error: 3202
Error state: 1, Severity: 16
Source: Microsoft SQL Server Native Client 11.0
Error message: Write on "{7F86B757-DC6E-4B76-B35D-F382797EB665}414" failed: 995(The I/O operation has been aborted because of either a thread exit or an application request.)

The server has 536 database and 8 core, so the Veeam support suppose that the problem was related to the an insufficent number of worker thread, so according to https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option?view=sql-server-ver16, we have change the number of worker thread to 1000 but errors continue to raise, so Veeam support suggest to have help in order to optimize machine and thread according to the load.

Can anyone suggest a way to proced? Sorry for my bad english and thanks in advance... Stefano

1 Answers1

0

Performing a snapshot of a system will trigger various sub-systems on the target server as explained below.

  1. In your case Veeam will trigger the Volume Shadow Copy service (aka VSS Writer) at the operating system level.
  2. Because the SQL Server VSS Writer service is registered with the Operating System, the OS will tell the SQL Server VSS Writer service that a snapshot of the system is about to be performed.
  3. The SQL Server VSS Writer service will freeze all I/O for the database and will report back to the Volume Shadow Copy service that all I/O has been frozen at the SQL Server level.
  4. The Volume Shadow Copy will freeze all I/O at the OS level and report back to Veeam, that all I/O has been frozen.
  5. Veeam will then perform a block-level (disk) snapshot of the system.
  6. The snapshot will be stored in SQL Server's backup history as a is_snapshot backup.

When a snapshot of a SQL Server instance is performed, it should show up in the SQL Server ERRORLOG file similar to the following:

2023-09-17 18:04:45.99 spid54      I/O is frozen on database model. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2023-09-17 18:04:45.99 spid57      I/O is frozen on database msdb. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
[...]
2023-09-17 18:04:45.99 spid63      I/O is frozen on database master. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
[...]
2023-09-17 18:04:50.75 spid63      I/O was resumed on database master. No user action is required.
2023-09-17 18:04:50.76 spid54      I/O was resumed on database model. No user action is required.
[...]
2023-09-17 18:04:50.77 spid57      I/O was resumed on database msdb. No user action is required.

This is a sign that the SQL Server VSS Writer has been successfully notified of the snapshot and after the snapshot was performed it unfreezes all I/O again so that transactions can be written to the database files.

What does your ERRORLOG look like?

Answering Your Question

Can anyone suggest a way to proceed?

Increase the amount of worker threads

Microsoft has a recommendation to increase the worker threads:

In SQL Server, the snapshot backup of each database uses five threads in the Sqlservr.exe process. Additionally, other activities may also use threads in the Sqlservr.exe process. Depending on the configuration of SQL Server, the available threads may be used up if you create a snapshot backup of many databases at the same time.

...and...

Workaround

  • To work around this problem, create a snapshot backup of fewer databases at the same time.
  • If you are running a 64-bit version of SQL Server, you can consider increasing the Max Worker Threads configuration option. It is not recommended to alter this setting on a 32-bit version of SQL Server.

Reference: Error when you create a snapshot backup of many databases at the same time in SQL Server (Microsoft Learn | SQL)

So you would have to quintuple the amount of worker threads: 5 * 536 = 2680 (at least!!)

A reboot is required.

Ask Microsoft

If you have a Microsoft Support Agreement, ask Microsoft for support.

Veeam Snapshot Backups

Don't use them.... Well you already have FULL, (DIFF) and TLOG backups, so why go the length of having additional Veeam Snapshot Backups that you can only use for that specific point-in-time?

Going Forward

You might want to consider reducing the amount of database per SQL Server instance. Or actually creating new SQL Servers and spreading the load.

John K. N.
  • 18,854
  • 14
  • 56
  • 117