Cluster: Cluster-Wide Transactions

Cluster-wide transactions are a way to ensure that certain operations will favor consistency over availability in the CAP theorem.
Code examples and client API can be found here.
In this page:
- Why Cluster-Wide Transactions
- How Cluster Transactions Work described by the flow of a cluster transaction request example.
- Cluster Transactions Properties
- Concurrent Cluster-Wide and Single-Node Transactions
- Failure Modes in Cluster-Wide Transactions
- Debug Cluster-Wide Transactions

Why Cluster-Wide Transactions

Usually, RavenDB uses the multi-master model and applies a transaction on a single node first and then asynchronously replicates the data to other members in the cluster. This ensures that even in the presence of network partitions or hard failures, RavenDB can accept writes and safely keep them.

The downside of the multi-master model is that certain error modes can cause two clients to try to modify the same set of documents on two different database nodes. That can cause Conflicts and make it hard to provide certain guarantees to the application. For example, ensuring the uniqueness of a user's email in a distributed cluster. Just checking for the existence of the email is not sufficient. Perhaps two clients may be talking to separate database nodes and both of them check that the user does not exist. They will both create what will end up as a duplicate user.

To handle this (and similar) scenarios, RavenDB offers the cluster-wide transaction feature. It allows you to explicitly state that you want a particular interaction with the database to favor consistency over availability to ensure that changes are going to be applied in an identical manner across the cluster even in the presence of failures and network partitions.

In order to ensure that, RavenDB requires that a cluster-wide transaction will be accepted by at least a majority of the voting nodes in the cluster. If it is not able to do so, the cluster-wide transaction will fail.

For the rest of this document we are going to refer to single-node transactions, applied on a single node and then disseminated using async replication vs. cluster-wide transactions that are accepted by a majority of the nodes in the cluster and then applied on each of them.

How Cluster Transactions Work

A request sent from the client via SaveChanges() using TransactionMode.ClusterWide will generate a Raft Command and the server will wait for a consensus on it.
When consensus is achieved, each node will validate the compare-exchange values first.
If this fails, the entire session transaction is rolled back. From the nature of the Raft consensus algorithm the cluster-wide transaction should either eventually be accepted on all nodes or fail on all of them.
Once the validation has passed, the request is stored on the local cluster state machine of every node and is processed asynchronously by the relevant database.
The relevant database notices that it has pending cluster transactions and starts to execute them.
Since order matters, a failure at this stage will halt the cluster transaction execution until it is fixed.
The possible failure modes for this scenario are listed below.
Every document that has been added by the cluster transaction gets the RAFT:int64-sequential-number Change Vector and will have priority if a conflict arises between that document and a document from a regular transaction.
After the database has executed the requested transaction, a response is returned to the client.
- Upon success, the client receives the transaction's Raft Index which will be added to any future requests. Performing an operation against any other node will wait for that Raft index to be applied first, ensuring order of operations.
In the background, the Cluster Observer tracks the completed cluster transactions and removes the local cluster-state-machine only when it has been successfully committed on all of the database nodes.

Cluster Transactions Properties

The Cluster transaction feature enables RavenDB to perform consistent cluster-wide ACID transactions.
It can be composed of two optional parts:

Compare Exchange values, which will be validated and executed by the cluster.

Compare exchange key/value pairs can be created and managed explicitly in your code.
Starting from RavenDB 5.2, they can also be created and managed automatically by RavenDB.
Compare exchange entries that are automatically administered by RavenDB are called Atomic Guards,
read more about them here.
Store/Delete operations on documents, which are executed by the database nodes after the transaction has been accepted.

Atomicity
After having a quorum for the cluster transaction request by raft and a successful concurrency check of the compare exchange values, the transaction is guaranteed to be executed.
Failure during the quorum or the concurrency check will roll back the entire session transaction,
while failure during the commit of the documents will halt any further cluster transactions execution on the database until that failure is remedied (failure mode for the documents commits are described later here).

Consistency
Consistency is guaranteed on the requested node. The node will complete the request only when the transaction is completed and the documents are persisted on the node. The response to the client will contain the cluster transaction Raft Index. It will be added to any future requests in order to ensure that the node has committed that transaction before serving the client.

Durability
Once the transaction has been accepted, it is guaranteed to run on all the database's nodes, even in the case of system (or even cluster-wide) restarts or failures.

Concurrent Cluster-Wide and Single-Node Transactions

Case 1: Multiple concurrent cluster transactions

Optimistic concurrency for cluster-wide transactions is handled using the compare-exchange feature. The transaction compare-exchange operations are validated and if they can't be executed because the compare-exchange values have changed since the transaction was initiated, the entire session transaction is aborted and an error is returned to the client.

Optimistic concurrency at the document level is not supported for cluster-wide transactions. Compare-exchange operations should be used to ensure consistency in that regard. Concurrent cluster-wide transactions are guaranteed to appear as if they are run one at a time.

Cluster-wide transactions may only contain PUT / DELETE commands. This is required to ensure that we can apply the transaction to each of the database nodes without regard to the current node state (note: the update of a document is effectively executed as PUT command).

Information

If the concurrency check of the compare exchange has passed, the transaction will proceed and will be committed on all the database nodes.

Case 2: Concurrent cluster and non-cluster transaction

When mixing cluster-wide transactions and single-node transactions, you need to be aware of the rules RavenDB uses to resolve conflicts between them.

Documents changed by the cluster-wide transactions will always have precedence in such a conflict and will overwrite changes made in a single node transaction. It is common to use cluster-wide transactions for certain high-value operations such as the creation of a new user, sale of a product with a limited inventory, etc., and use single-node transactions for the common case.

A single node transaction that operates on data that has been modified by a cluster-wide transaction will operate as usual, as the cluster-wide transaction has already been applied (either directly on the node or via replication) the cluster-wide transaction will not be executed again.

Replication will try to synchronize the data, so in order to avoid conflicts every document that was modified under the cluster transaction will receive the special RAFT:int64-sequential-number Change Vector and the special flag FromClusterTx which ensures precedence over a regular change vector.

Case 3: Cluster transaction with an External incoming replication

While the internal replication with the cluster is discussed in the previous case, the case where two clusters are connected via external replication is a bit different.

The logic of documents that were changed by a cluster transaction versus documents that were changed by a regular transaction stays the same. However, in the case where a conflict is on a document that was changed by both a local cluster transaction and a remote cluster transaction, the local one will have precedence. Furthermore, the FromClusterTx flag will be removed, which means that on the next conflict the local is no longer treated as modified by a cluster-wide transaction.

Failure Modes in Cluster-Wide Transactions

No majority

A cluster-wide transaction can operate only within a functional cluster. Thus, if no consensus was acquired for the cluster transaction by the majority of the nodes or currently there is no leader, the transaction will be rolled back.

Concurrency issues for compare-exchange operations

Acquiring a consensus doesn't mean the acceptance of the transaction. Once the consensus is acquired, each node does a concurrency check on the compare-exchange values. If the concurrency check fails, the transaction will be rolled back.

Failure to apply transaction on database nodes

Once the transaction has passed the compare-exchange concurrency check, the transaction is guaranteed to be committed. Any failure at this stage must be remedied.

Failure	How to fix it
Out of disk space	Freeing space will fix the problem and allow cluster transactions to be committed.
Creation/Deletion of a document with a different collection	Deleting the document from the other collection

Warning

The execution of cluster transactions on the database will be stopped until these types of failures are fixed.

Debug Cluster-Wide Transactions

The current state of the cluster transactions that are waiting to be completed by all of the database nodes can be found at:

URL	Type	Permission
`/databases/*/admin/debug/cluster/txinfo`	`GET`	`DatabaseAdmin`

Parameters

Name	Type	Description
`from` (optional)	`long` (default: 0)	Get cluster transactions from the raft change vector index.
`take` (optional)	`int` (default: `int.MaxValue`)	The number of cluster transaction to show.

see on GitHub

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

Cluster: Cluster-Wide Transactions

Why Cluster-Wide Transactions

How Cluster Transactions Work

Cluster Transactions Properties

Concurrent Cluster-Wide and Single-Node Transactions

Case 1: Multiple concurrent cluster transactions

Information

Case 2: Concurrent cluster and non-cluster transaction

Case 3: Cluster transaction with an External incoming replication

Failure Modes in Cluster-Wide Transactions

No majority

Concurrency issues for compare-exchange operations

Failure to apply transaction on database nodes

Warning

Debug Cluster-Wide Transactions

Related Articles

Client API

Session

Server

Code Walkthrough

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles