Sharding: External Replication
- External Replication is an ongoing RavenDB task that you can create and run to maintain a live replica of your database on another RavenDB server.
- Defining an external replication task via Studio or API under a sharded database is similar to defining such tasks under a non-sharded database.
-
Sharded and non-sharded databases can replicate data to each other, providing their version is at least 6.0.
-
In this page:
Supported Versions
- A sharded database and a non-sharded database can replicate data to each other, providing their versions are 6.0 or higher.
- Replicating data between a sharded database and a RavenDB version earlier than 5.4 is not supported.
- Non-sharded databases can replicate data to each other regardless of their version. E.g., a non-sharded 6.0 database can replicate data to a 5.2 database and vice versa.
External Replication Types
Internal -vs- External Replication
-
Internal replication is applied automatically when the replication factor is larger than 1, to make the shard database more available by maintaining multiple accessible copies of it.
Learn more about shards internal replication in the overview article and administration Studio and API articles. -
External replication is applied when a dedicated task is defined for it.
Read more about it and follow a step-by-step guide here.
All data replicated by or to a sharded database is mediated via orchestrators. The shards themselves are oblivious to their being shards: from a shard's perspective, it is just a regular RavenDB database that can, among its other ordinary RavenDB features, replicate data.
External replication from and to non-sharded databases requires no special syntax or preparations. It does, however, cost the server some additional work, that, especially when the database is large and every extra operation counts, should be taken into account by the administrator. Here is how external replication works behind the scenes.
Non-Sharded Database to Sharded Database
The image below depicts a non-sharded database replicating data to a 5-shard database.
Non-Sharded Database to Sharded Database
- Non-Sharded Database
- Replication to Sharded Database
The database is unaware that the destination database is sharded, no special syntax or preparation is needed. - Orchestrator
The orchestrator receives and prepares the replicated data, grouping documents and document extensions by document IDs so each entity can be stored in the correct shard. - Transfer to Shard
The orchestrator transfers each destination shard its data.
Optimization routines are applied to make the process as effective as possible. - Shard
Document and document extensions are assigned to buckets by document ID.
Shard replies to replicated data and replication attempts are similar to replies made by non-sharded databases.
Sharded Database to Sharded Database
- The image below depicts a 3-shard database replicating data to a 5-shard database.
- Each shard replicates its data as an autonomous database.
Sharded Database to Sharded Database
- DB 1 Shard
The shard is unaware that the destination database is sharded. - Replication to DB 2
The database is unaware that the destination database is sharded, no special syntax or preparation is needed. - DB 2 Orchestrator
The orchestrator receives and prepares the replicated data, grouping documents and document extensions by document IDs so each entity can be stored in the correct shard. - Transfer to Shard
The orchestrator transfers each destination shard its data.
Optimization routines are applied to make the process as effective as possible. - DB 2 Shard
Documents and document extensions are assigned to buckets by document ID.
Shard replies to replicated data and replication attempts are similar to replies made by non-sharded databases.
Performance Considerations
When external replication tasks are defined in two different sharded databases, so they replicate their data to each other, each data item sent from one of the databases to the other will then be sent back from the target database to the original sender. The original sender will then recognize that the data item is already stored locally, ignore it, and end the cycle.
In this specific case, of two external databases replicating data to each other, please consider this overhead in your performance considerations and tests.