Supervision Overview

Supervision is a critical concept in building resilient actor systems, ensuring that the system can recover from failures and continue operating without interruption. In Kameo, supervision and actor lifecycle management are facilitated through a combination of customizable behavior hooks and actor linking. This page discusses how to effectively use these features to create robust actor hierarchies and manage actor failures gracefully.

Customizing Actor Behavior on Panic

When an actor panics, its default behavior can be customized using the on_panic hook within the Actor trait. This hook allows developers to define custom logic that should execute when an actor encounters a panic, providing a first line of defense in managing unexpected failures.

Linking Actors

Beyond individual actor behavior, Kameo supports linking actors together to create a supervision tree. This structure enables actors to monitor each other's health and respond to failures, forming the backbone of a self-healing system.

Actor Links

Actors can be linked using ActorRef::link, which establishs a sibling relationship between the two actors.

Remote actors can be linked using the ActorRef::link_remote and RemoteActorRef::link_remote methods.

Handling Link Failures

When a linked actor dies, the surviving actors can react to this event using the on_link_died hook in the Actor trait. This hook provides the ID of the deceased actor and the reason for its termination, enabling the surviving actors to implement custom logic, such as restarting the failed actor or taking other remedial actions.

The default behavior for on_link_died is to stop the current actor if the linked actor died for any reason other than a normal shutdown. This conservative default ensures that failures are not silently ignored, promoting system stability by preventing dependent actors from continuing in an inconsistent state.

In the case of remote actor links, if a peer/node gets disconnected, then all links to actors on that peer will be considered dead, with ActorStopReason::PeerDisconnected being signaled to the linked actors.

Unlinking Actors

In some scenarios, it may be necessary to remove links between actors, either to restructure the supervision tree or in response to changing application dynamics. Kameo provides the following methods for this purpose:

  • ActorRef::unlink: Unlinks two previously linked sibling actors.
  • ActorRef::unlink_remote: Unlinks two previously linked sibling remote actors.
  • RemoteActorRef::unlink_remote: Unlinks two previously linked sibling remote actors.

These methods allows for dynamic adjustments to the actor supervision hierarchy, ensuring that the system can adapt to new requirements or recover from errors by reorganizing actor relationships.

Summary

Supervision in Kameo is a powerful mechanism for building resilient, self-healing actor systems. By leveraging customizable panic behavior and actor linking, developers can design systems that are capable of recovering from failures, maintaining consistent operation through disruptions. Whether through parent-child hierarchies or peer relationships, actor links in Kameo provide the foundation for a robust supervision strategy, ensuring that actor systems can gracefully handle and recover from the inevitable challenges they face.