0

I will be straightforward and put my question before I add any other information, as I want to keep it objective and to the point.

the question is:

How to find out why the replication services are stopping?

I thought that this info would be stored in the distribution database like this:

select * from distribution.dbo.msrepl_errors (nolock) where id = 0

But this was not the case.

I am coming across some errors on my transactional replication. that I could clearly see:

enter image description here

Source: Microsoft.SqlServer.Management.Sdk.Sfc Target Site: Void FilterException(System.Exception) Message: Exception of type 'System.OutOfMemoryException' was thrown. Stack: at Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException.FilterException(Exception e) at Microsoft.SqlServer.Management.Sdk.Sfc.Enumerator.Process(Object connectionInfo, Request request) at Microsoft.SqlServer.Management.Smo.ExecutionManager.GetEnumeratorData(Request req) at Microsoft.SqlServer.Management.Smo.SqlSmoObject.GetDbCollation(String dbname) at Microsoft.SqlServer.Management.Smo.SqlSmoObject.GetDbComparer(Boolean inServer) at Microsoft.SqlServer.Management.Smo.SqlSmoObject.InitializeStringComparer() at Microsoft.SqlServer.Management.Smo.AbstractCollectionBase.get_StringComparer() at Microsoft.SqlServer.Management.Smo.SimpleObjectCollectionBase.InitInnerCollection() at Microsoft.SqlServer.Management.Smo.SmoCollectionBase.GetObjectByKey(ObjectKeyBase key) at Microsoft.SqlServer.Management.Smo.DatabaseCollection.get_Item(String name) at Microsoft.SqlServer.Replication.Snapshot.SmoScriptingManager.PrefetchObjects(ObjectPrefetchControl[] objectPrefetchControls) at Microsoft.SqlServer.Replication.Snapshot.SmoScriptingManager.DoPrefetchWithRetry() at Microsoft.SqlServer.Replication.Snapshot.SmoScriptingManager.DoScripting() at Microsoft.SqlServer.Replication.Snapshot.SqlServerSnapshotProvider.DoScripting() at Microsoft.SqlServer.Replication.Snapshot.SqlServerSnapshotProvider.GenerateSnapshot() at Microsoft.SqlServer.Replication.SnapshotGenerationAgent.InternalRun() at Microsoft.SqlServer.Replication.AgentCore.Run()

while running a snapshot:

memory mapped file write failed

and previous to that:

System.OutOfMemoryException

enter image description here

and also a few of these:

Error messages: The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

enter image description here

Error messages: ⦁ Message: memory mapped file write failed Stack:
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.ThrowNativeBcpOutException(CConnection* pNativeConnectionWrapper) at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.BcpOut(String strBcpObjectName, String strBcpObjectOwner, String strBaseBcpObjectName, Boolean fUnicodeConversion, String strDataFile, String strLoadOrderingHint, String strWhereClause, Boolean useTableLockHint, Int32 bcpFileFormatVersion) at Microsoft.SqlServer.Replication.Snapshot.SqlServer.BcpOutThreadProvider.DoWork(WorkItem workItem) at Microsoft.SqlServer.Replication.WorkerThread.NonExceptionBasedAgentThreadProc() at Microsoft.SqlServer.Replication.AgentCore.BaseAgentThread.AgentThreadProcWrapper() (Source: MSSQLServer, Error number: 0) Get help: http://help/0

and the previous error message:

Error messages: ⦁ Source: Microsoft.SqlServer.Management.Sdk.Sfc Target Site: Void FilterException(System.Exception) Message: Exception of type 'System.OutOfMemoryException' was thrown. Stack: at Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException.FilterException(Exception e) at Microsoft.SqlServer.Management.Sdk.Sfc.Enumerator.Process(Object connectionInfo, Request request) at Microsoft.SqlServer.Management.Smo.ExecutionManager.GetEnumeratorData(Request req) at Microsoft.SqlServer.Management.Smo.SqlSmoObject.GetDbCollation(String dbname) at Microsoft.SqlServer.Management.Smo.SqlSmoObject.GetDbComparer(Boolean inServer) at Microsoft.SqlServer.Management.Smo.SqlSmoObject.InitializeStringComparer() at Microsoft.SqlServer.Management.Smo.AbstractCollectionBase.get_StringComparer() at Microsoft.SqlServer.Management.Smo.SimpleObjectCollectionBase.InitInnerCollection() at Microsoft.SqlServer.Management.Smo.SmoCollectionBase.GetObjectByKey(ObjectKeyBase key) at Microsoft.SqlServer.Management.Smo.DatabaseCollection.get_Item(String name) at Microsoft.SqlServer.Replication.Snapshot.SmoScriptingManager.PrefetchObjects(ObjectPrefetchControl[] objectPrefetchControls) at Microsoft.SqlServer.Replication.Snapshot.SmoScriptingManager.DoPrefetchWithRetry() at Microsoft.SqlServer.Replication.Snapshot.SmoScriptingManager.DoScripting() at Microsoft.SqlServer.Replication.Snapshot.SqlServerSnapshotProvider.DoScripting() at Microsoft.SqlServer.Replication.Snapshot.SqlServerSnapshotProvider.GenerateSnapshot() at Microsoft.SqlServer.Replication.SnapshotGenerationAgent.InternalRun() at Microsoft.SqlServer.Replication.AgentCore.Run() (Source: Microsoft.SqlServer.Management.Sdk.Sfc, Error number: 0) Get help: http://help/0

and while troubleshooting it it turned out that the computer is really out of memory

enter image description here

Marcello Miorelli
  • 17,274
  • 53
  • 180
  • 320

1 Answers1

0

Instead of find out what exactly caused the services to stop I have created a job that will restart them services if they are not running.

Simple script to check SQL Server Replication Jobs

CREATE PROCEDURE GetReplicationAgentStatus
AS

BEGIN set nocount on set transaction isolation level read uncommitted

/* Make sure your agents are in the correct category i.e Merge agents under REPL-Merge, Distribution agents under REPL-Distribution and LogReader agent under REPL-LogReader */

select s.job_id,s.name,s.enabled,c.name as categoryname into #JobList from msdb.dbo.sysjobs s inner join msdb.dbo.syscategories c on s.category_id = c.category_id where c.name in ('REPL-Merge','REPL-Distribution','REPL-LogReader')

create TABLE #xp_results
(job_id UNIQUEIDENTIFIER NOT NULL, last_run_date INT NOT NULL, last_run_time INT NOT NULL, next_run_date INT NOT NULL, next_run_time INT NOT NULL, next_run_schedule_id INT NOT NULL, requested_to_run INT NOT NULL, request_source INT NOT NULL, request_source_id sysname COLLATE database_default NULL, running INT NOT NULL, current_step INT NOT NULL, current_retry_attempt INT NOT NULL, job_state INT NOT NULL)

insert into #xp_results exec master.dbo.xp_sqlagent_enum_jobs 1, ''

select j.name,j.categoryname,j.enabled, AgentStatus = CASE WHEN r.running =1 THEN 'Running' else 'Stopped' end from #JobList j inner join #xp_results r on j.job_id=r.job_id

-- Uncomment the below portion and use correct parameters to send email alert /* if exists (select j.name,j.categoryname,j.enabled,r.running from #JobList j inner join #xp_results r
on j.job_id=r.job_id where running =0 ) begin declare @subject nvarchar(100) select @subject = N'Replication Agents Status on '+@@servername

EXEC msdb.dbo.sp_send_dbmail @profile_name = 'ProfileName', @recipients = N'email id', @subject = @subject, @body = 'One or more agents found stopped' end */ drop table #JobList,#xp_results END

this procedure above is wrapped in a job and scheduled to run every hour.

the little difference is I put everything into a temp table, and read it row by row and start the jobs that are not running:

             Print @job_name +' is currently not running.'
             exec msdb.dbo.sp_start_job @Job_id = @job_id
Marcello Miorelli
  • 17,274
  • 53
  • 180
  • 320