Distributed Database


  • In RavenDB, a database can be replicated across multiple nodes, depending on its Replication Factor.

  • A node where a database resides is referred to as a Database Node.
    The group of Database Nodes that assemble the distributed database, is called a Database Group.

  • Each Database Node has a full copy of the database, it contains all the database documents and it indexes them locally.
    This greatly simplifies executing query requests, since there is no need to orchestrate an aggregation of data from various nodes.

  • In this page:


The Database Record

Each database instance keeps its configuration (e.g. Index definitions, Database topology) in a Database Record object.
Upon database creation, this object is passed through Rachis to all nodes in the cluster.

After that, each node updates its own database record on its own upon any new Raft command received,
i.e. when an index has changed.

Replication Factor

When creating a database it is possible to specify the exact nodes for the Database Group, or just the number of nodes needed.
This will implicitly set the Replication Factor to the specified amount of nodes.

It is possible to pass the Replication Factor explicitly and let RavenDB choose the nodes on which the database will reside.

Either way, once the database is created by getting a Consensus,
the Cluster Observer begins monitoring the Database Group in order to maintain the Replication Factor.

Database Topology

The Database Topology describes the relations inside the Database Group between the Database Nodes.
Each Database Node can be in one of the following states:

State Description
Member Fully updated and functional database node.
Promotable A node that has been recently added to the group and is being updated.
Rehab A former Member node that is assumed to be not up-to-date due to a partition.

States Flow

In general, all nodes in a newly created database are in a Member state.
When adding a new Database Node to an already existing database group, a Mentor Node is selected in order to update it.
The new node will be in a Promotable state until it receives and indexes all the documents from the mentor node.

Nodes Order

The database topology is kept in a list that is always ordered with Member nodes first, then Rehabs and Promotables are last. The order is important since it defines the client's order of access into the Database Group, (see Load balancing client requests).
The order can be changed with the Client-API or via the Studio.

Replication

All Members have master-master Replication in order to keep the documents in sync across the nodes.

Dynamic Database Distribution

If any of the Database Nodes is down or partitioned, the Cluster Observer will recognize it and act as following:

  1. If Cluster.TimeBeforeMovingToRehabInSec (default: 60 seconds) time has passed and the node is still unreachable,
    the node will be moved to a Rehab state.

  2. If the node is for Cluster.TimeBeforeAddingReplicaInSec (default: 900 seconds) still in Rehab,
    a new database node will be automatically added to the database group to replace the Rehab node.

  3. If the Rehab node is online again, it will be assigned with a Mentor Node to update him with the recent changes.

  4. The first node to be up-to-date stays, while the other is deleted.

Deletion

The Rehab node is actually deleted only when it is re-connected to the cluster,
and only after it has finished sending all its new documents that it may have (while it was disconnected) to the other nodes in the Database Group.

The Dynamic Database Distribution feature can be toggled on and off with the following request:

URL Method URL Params
/admin/databases/dynamic-node-distribution POST name=[database-name], enable=[bool]

Sharding

Currently, RavenDB 4.x doesn't offer sharding as an out of the box solution.
It is on the development roadmap to be implemented in a future version. Track it here.