0

I know you can configure NServiceBus to automatically retry to send messages (FLR: First Level Retries) and wait before retrying again (SLR: Second Level Retries), but, using the default configuration (5 FLR + 5 SLR) it'll take about one minute before seeing a message into the <error> queue.

I understand the value of automatic retries, but isn't it better to fail early, configuring zero FLR pus zero SLR and actually coding expecting errors to occur ?

I mean, automatic retries goes against Fail-Fast paradigm, doesn't it ?

Machado
  • 4,130

2 Answers2

2

Retries in NServiceBus are intended to be a method for handling transient issues such as network failures, reboots, etc. in which case the issue is expected to be resolved fairly quickly, and you still expect the message to be handled as soon as possible. If instead you have an exception because of a command that is invalid for business reasons, you probably want to catch the exception and send a response message/publish an event, indicating that the command could not be carried out because it would violate a business rule.

The general idea is that infrastructure exceptions are handled by the NServiceBus infrastructure, and business exceptions are handled by your business logic - in message handlers, sagas, domain model, etc.

pnschofield
  • 344
  • 2
  • 5
0

The failures that "fail-fast" apply to are the kind that you can detect and report immediately. In an NServiceBus, these manifest as exceptions. When an exception is thrown by a service, NServiceBus can be configured to immediately retry a certain number of times (five, by default). This works if the problem is not a persistent one.

If five retries do not correct the problem, you can safely assume that some dependent service has a problem (i.e. a JSON service on which your service depends is down, or a database deadlock occurred), it is common to back off, wait awhile, and perform a retry. NServiceBus will do this up to five times by default, with an increasing amount of delay each time.

In short, "fail fast" only applies when you can actually fail fast. It doesn't apply in situations where a timeout occurs, or where problems can occur that are outside of your control.

Further Reading
NServiceBus Recovery Options

Robert Harvey
  • 200,592