3

I'm playing with a mental model for distributed actor system that communicates using messages. In this, it is possible for an actor to create another actor. I've encountered one specific situation where I'm not really sure how to resolve. Consider this sequence of events:

  1. Actor 1 requests creation Actor 2. As part of the "construction", it passes its own ID.
  2. Actor 1 requests its own deletion.
  3. System sends KILL message to all Actors who subscribed to {Actor1|KILL}
  4. Actor 2 is created, saves ID of "1" sent to it as construction parameter.
  5. Actor 2 attempts to send a message to Actor 1.
  6. ???
  7. Actor 2 detects that Actor 1 is gone, and reacts.

Since Actor 1 and 2 can be on physically different machines, querying "Does Actor X exist?" every time (or before every time) you attempt to send a message will introduce latency that shouldn't be there in the typical case. Similarly, blocking until you know "Send Success" or "Send Fail" seems like a bad idea. It also seems that litering code with if(send() == fail) { ... } is ugly and error-prone. Are there known solutions for robustly and cleanly handling these sorts of situations?

PatrickB
  • 133

1 Answers1

3

This problem has been faced may times when trying to build a reliable communication protocol on top of an unreliable one. That starts with TCP (as the underlying IP protocol is unreliable) and any protocol that uses UDP as transport mechanism.

The basic idea is that each message/packet gets an identification number and the recipient sends an acknowledgment of the messages/packets that it received. If the sender doesn't get an acknowledgment within a certain time frame, then the message is considered lost and the sender retries sending it. After a certain number of retries, delivery of the message is usually given up and a failure is reported to the application.

On the application level, there are two main ways to handle it:

  1. A blocking send function. This should only be used if the timeouts are relatively short. You don't want to block your actor for a minute because the recipient is difficult to reach. A blocking send function can either return a status indicating success/failure, or it can throw an exception (if failure is considered an exceptional situation and/or it is likely that the direct callers of send can't really do anything with the error except passing it on up the stack).

  2. A non-blocking send function. In this case, the failure notification will often be in the form of a callback or event that arrives some time after sending the message.

In either case, for messages where the actor isn't interested in the success/failure of delivery, the actor can just choose to ignore failure indications, or there can be a mechanism to say to the lower layers "don't bother retrying this message".