Distributed Database



The Database Record

Each database instance keeps its configuration (e.g. Index definitions, Database topology) in a Database Record object. Upon database creation, this object is propagated through Rachis to all nodes in the cluster.

After that, each node updates its own Database Record independently upon receiving any new Raft command,
e.g. when an index has changed.

Replication Factor

When creating a database, you can either:

  • Explicitly specify the exact nodes to use for the Database Group
  • Or, specify only the number of nodes needed and let RavenDB automatically choose the nodes
    on which the database will reside.

In either case, the number of nodes will represent the Replication Factor.

Once the database is created by getting a Consensus,
the Cluster Observer begins monitoring the Database Group in order to maintain this Replication Factor.

Database Topology

The Database Topology describes the relationships between the Database Nodes within the Database Group.
Each Database Node can be in one of the following states:

State Description
Member Fully updated and functional database node.
Promotable A node that has been recently added to the group and is being updated.
Rehab A former Member node that is assumed to be not up-to-date due to a partition.

States Flow

  • In general, all nodes in a newly created database are in a Member state.
  • When adding a new Database Node to an already existing database group, a Mentor Node is selected in order to update it.
    The new node will be in a Promotable state until it receives and indexes all the documents from the mentor node.
  • Learn more in:

Nodes Order

  • The database topology is kept in a list that is always ordered with Member nodes first,
    then Rehabs and Promotables are last.
  • The order is important since it defines the client's order of access into the Database Group,
    (see Load balancing client requests).
  • The order can be modified using the Client-API or via the Studio.

Replication

All Members have master-master Replication in order to keep the documents in sync across the nodes.

Dynamic Database Distribution

If any of the Database Nodes is down or partitioned, the Cluster Observer will recognize it and act as follows:

  1. If the time that is defined in TimeBeforeMovingToRehabInSec (default: 60 seconds) has passed
    and the node is still unreachable, the node will be moved to a Rehab state.

  2. If the node remains in Rehab for the time defined in TimeBeforeAddingReplicaInSec (default: 900 seconds),
    a new database node will be automatically added to the database group to replace the Rehab node.

  3. If the Rehab node is online again, it will be assigned a Mentor Node to update it with the recent changes.

  4. The first node to be up-to-date stays, while the other is deleted.

Deletion

The Rehab node is deleted only when it reconnects to the cluster, and only AFTER it has finished sending all new documents it may have (while disconnected) to the other nodes in the Database Group.

The Dynamic Database Distribution feature can be toggled on and off with the following request:

URL Method URL Params
/admin/databases/dynamic-node-distribution POST name=[database-name], enable=[bool]

Sharding

  • Sharding, supported by RavenDB as an out-of-the-box solution starting with version 6.0,
    is the distribution of a database's content across autonomous shards.

  • Learn more about sharding in this dedicated Sharding overview article.