see on GitHub

Highly Available Tasks


  • A RavenDB Task can be one of the following:

  • There is no single coordinator handing out tasks to a specific node.
    Instaed, each node decides on its own if it is the Reponsible Node of the task.

  • Each node will re-evaluate its responsibilities with every change made to the Database Record,
    such as defining a new index, configuring or modifying an Ongoing Task, any Database Topology change, etc.

  • In this page:


Constraints

  1. Task is defined per Database Group.

  2. Task is executed by a single Database Node only.
    With Backup Task being an exception in case of a cluster partition, see Backup Task - When Cluster or Node are Down.

  3. A Database Node can be assigned with many tasks.

  4. The node must be in a Member state in the Database Group in order to perform a task.

  5. Cluster must be in a functional state.

Responsible Node

  • Responsible Node is the node that is responsible to perform a specific Ongoing Task.

  • Each node checks whether it is the Responsible Node for the task by executing a local function that is based on the
    unique hash value of the task and the current Database Topology.

  • Since the Database Topology is eventually consistent across the cluster,
    there will be an eventually consistent single Responsible Node, which will answer the above constraints.

Mentor Node

The node is called a Mentor Node when its task is updating a Rehab or a Promotable.

Tasks Relocation

  • Upon a Database Topology change, all existing tasks will be re-evaluated and re-distributed among the functional nodes.

  • The responsible node for an Outgoing Task is also re-evalutated upon a change in the unique hash value of the Ongoing Task.

For example:

Let's assume that we have a 5 nodes cluster [A, B, C, D, E] with a database on [A, B, E] and a task on node B.

Node B has network issues and is separated from the cluster. So nodes [A, C, D, E] are on one side and node [B] is on the other side.

The Cluster Observer will note that it can't reach node B and issue a Raft Command in order to move node B to a Rehab state.

Once this change has propagated, it will trigger a re-assessment of all tasks in all reachable nodes.
In our example the task will move to either A or E.

In the meanwhile, node B which has no communication with the Cluster Leader,
moves itself to be a Candidate and removes all its tasks.