RavenDB
Performance Overview

This page presents a clear picture of how well RavenDB performs crucial tasks such as CRUD operations, queries, and indexing. Spoiler alert: We’re elated with the results!

To be able to estimate how RavenDB will perform in your environment, see performance statistics under various load demands, disk types, instance types, and platforms.

CRUD

In these scenarios, we measure the performance of loads, creates, and updates using different operations available via the RavenDB Session Class, and using the RavenDB REST API.

Store

This scenario stores 100K Company documents using our bulk docs endpoint with a batch size set to 50 documents. The batches of documents are added serially in a single thread manner and more load can be added to the server. This endpoint is a foundation of the RavenDB Client API and is used by the Session.

You can read more about it here.

We measure the machine performance and the duration of the scenario.

The graph shows the number of documents imported per second. It highlights that this scenario puts significant stress on the disk. As a result, NVMe disks have a considerable advantage here. Apart from this, there are no major differences between the machine types.

Cluster size

3 node

Data set size

100k Documents

Bulk Insert

This scenario inserts 1M Company documents using the Bulk Insert API. It is very useful for migration or importing purposes of large bulks of data. The batches of documents are added serially in a single thread manner and more load can be added to the server.

You can read more about it here.

We measure the machine performance and the duration of the scenario.

The graph shows the number of documents imported per second. It highlights that this scenario puts significant stress on the disk. As a result, NVMe disks have a considerable advantage here. Apart from this, there are no major differences between the machine types.

Cluster size

1 node

Data set size

1M Documents

Data disk size

1.12GB

Document size

450 bytes

Random Loads

This scenario gives you an overview of random read operations on 6M User documents from the StackOverflow database. The commands are executed against the server using the REST API without client overhead. The same endpoint is used by our Client API to perform the Load operations.

You can read more about it here.

The 99% percentile of requests latency is under 34 ms.
The average request latency is 7 ms.

The graph shows the number of requests per second, indicating that this scenario is CPU-bound. Machines with better CPUs, like the 2vCPU 16GB RAM machine with NVMe disk, or those with more cores, allow us to retrieve more documents.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

6M Users documents

Sampling time

1 min

Import Dump File

This scenario measures the time taken to import a file with 6M User documents from the Stackoverflow database. In real-world applications this scenario occurs when restoring a database from a backup.

You can read more about it here.

We measure the machine performance and the duration of the scenario.

The graph shows the number of documents imported per second. This scenario is mostly CPU-bound, where machines with more cores allow us to retrieve more documents. Upgrading to a better disk improves performance by about 7%. However, using an NVMe type disk significantly boosts performance by roughly 21% for a 2vCPU 16GB RAM machine.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

6M Users documents

Queries

In these scenarios, we measure the performance of querying the database using various types of queries.

All queries are sent in parallel in 8 threads (using the wrk tool). All queries are querying on precalculated static indexes, and without using the caching abilities of RavenDB, which would make the queries even faster.

Query for a single record

In this scenario we are measuring the performance of a query with a ‘where’ clause against a predefined index with 454K entries.
The command is using the Query API capability without client overhead. The same endpoint is used by the querying mechanism in our Client API.

You can read more about it here.

The 99% percentile of queries latency is under 40 ms.
The average query latency is 10.54 ms.

The graph shows the number of requests per second. This scenario is CPU-bound since the dataset fits into memory. More cores allow us to retrieve more documents, making the disk type less significant. However, for a 2vCPU 16GB RAM machine with an NVMe disk, we get slightly better performance due to the superior CPU.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454K Users

Sampling time

1 min


                                        from index 'Users/DisplayNames'
where DisplayName = "user3654819"
select Views

                                        from user in docs.Users
select new {
   user.DisplayName
}

Query using date range and paging

In this scenario we are measuring the performance of a query with a ‘where’ clause on a DateTime field, against a predefined index with 454K entries. This is a range query with many unique terms.
The command is using the Query API capability without client overhead. The same endpoint is used by the querying mechanism in our Client API.

You can read more about it here.

The 99% percentile of queries latency is under 93 ms.
The average query latency is 87 ms.

The graph shows the number of requests per second. This scenario is CPU-bound since the dataset fits into memory. More cores allow us to retrieve more documents, making the disk type less significant. However, for a 2vCPU 16GB RAM machine with an NVMe disk, we get slightly better performance due to the superior CPU.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454K Users

Sampling time

1 min


                                        from index 'Questions/ByCreationDateAndTags'
where Tags = "html"
and CreationDate between
'2013-04-30T00:00:00.0000000Z'
and '2013-05-30T00:00:00.0000000Z'
limit 0, 10

                                        from q in docs.Questions
select new {
   CreationDate = q.CreationDate,
   Tags = q.Tags
}

Query by Multiple Facets

In this scenario we are measuring the performance of a ‘faceted’ query against a predefined index with 562K entries.
The command is using the Query API capability without client overhead. The same endpoint is used by the querying mechanism in our Client API.

You can read more about it here.

The 99% percentile of queries latency is under 65 ms.
The average query latency is 7 ms.

The graph shows the number of requests per second. This scenario is CPU-bound since the dataset fits into memory. More cores allow us to retrieve more documents, making the disk type less significant. However, for a 2vCPU 16GB RAM machine with an NVMe disk, we get slightly better performance due to the superior CPU.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454K Users

Sampling time

1 min


                                        from index 'Questions/TagsFaceted'
where Tags = "tdbgrid"
select facet(Year, sum(ViewCount)),
facet(AnswersCount < 1,
   
AnswersCount between 1 and 10,
   
AnswersCount > 10)

                                        from u in docs.Questions
select new
{
   Year = u.CreationDate.Year,
   ViewCount = u.ViewCount,
   Tags = u.Tags.Select(a => a),
   AnswersCount = u.Answers.Length
}

Order by textual field

In this scenario we are measuring the performance of a query with ‘orderby’ on a field with type string, against a predefined index with 562K entries.
The command is using the Query API capability without client overhead. The same endpoint is used by the querying mechanism in our Client API.

You can read more about it here.

The 99% percentile of queries latency is under 65 ms.
The average query latency is 59 ms.

The graph shows the number of requests per second. This scenario is CPU-bound since the dataset fits into memory. More cores allow us to retrieve more documents, making the disk type less significant. However, for a 2vCPU 16GB RAM machine with an NVMe disk, we get slightly better performance due to the superior CPU.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454K Users

Sampling time

1 min


                                        from index 'Questions/ByTagsOrderByTitle'
where Tags = "eigenvalue"
order by Year
select AnswerCount limit 0, 47

                                        from question in docs.Questions
select new {
   question.Tags,
   question.CreationDate.Year,
   question.ViewCount
}

Order by numeric field

In this scenario we are measuring the performance of a query with ‘orderby’ on a field with long type, against a predefined index with 562K entries.
The command is using the Query API capability without client overhead. The same endpoint is used by the querying mechanism in our Client API.

The 99% percentile of queries latency is under 97 ms.
The average query latency is 25 ms.

You can read more about it here.

The graph shows the number of requests per second. This scenario is CPU-bound since the dataset fits into memory. More cores allow us to retrieve more documents, making the disk type less significant. However, for a 2vCPU 16GB RAM machine with an NVMe disk, we get slightly better performance due to the superior CPU

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454K Users

Sampling time

1 min


                                        from index 'Questions/ByTagsOrderByTitle'
where Tags = "xval"
order by ViewCount as long
select AnswerCount
limit 0, 47

                                        from question in docs.Questions
select new {
   question.Tags,
   question.CreationDate.Year,
   question.ViewCount
}

Indexing

In these scenarios, we measure RavenDB performance during indexing.

Indexing Map-Reduce

In this scenario we are measuring the indexing speed (how many documents a given index is able to process per second) on a Map-Reduce index.
The index performs an aggregation on the Tags from the Questions collection of the Stackoverflow dataset.
Querying this index will return information about the occurrence of Tags (Count), the total number of Answers for each Tag (Answers), and a total number of accepted Answers for each Tag (AcceptedAnswers).

This is a common scenario and a typical usage of a Map-Reduce index to perform an aggregation operation The advantage of doing this in the index, instead of a Query, is that the results are pre-computed and almost no additional operations are needed to satisfy the query, giving you instant results.

You can read more about Map-Reduce indexes here.

The graph shows the number of documents mapped per second, highlighting that this scenario is disk-bound. Using an NVMe type disk significantly boosts performance by roughly 31% for a 2vCPU 16GB RAM machine and even outperforms a 16vCPU 64GB RAM machine with an SSD disk by approximately 26%.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454k Users


                                        from q in docs.Questions
from tag in q.Tags
select new {
   Tag = tag,
   Count = 1,
   Answers = q.AnswerCount,
   AcceptedAnswers =
   q.AcceptedAnswerId !=
      null ? 1 : 0
}

                                        from result in results
group result by result.Tag into g
select new {
   Tag = g.Key,
   Count = Enumerable.Sum(g, x =>
      x.Count),
   Answers = Enumerable.Sum(g, x0 =>
      x0.Answers),
   AcceptedAnswers =
   Enumerable.Sum(g, x1 =>
      x1.AcceptedAnswers)
}

Indexing Map-Reduce by Months

In this scenario we are measuring the indexing speed (how many documents a given index is able to process per second) on a Map-Reduce index.
The index performs an aggregation on the Tags and the CreationDate fields from the Questions collection of the Stackoverflow dataset.
Querying this index will return information about the occurrence of Tags (Count) for a particular Month.

This is a common scenario and typical usage of a Map-Reduce index to perform an aggregation operation. The advantage of doing this in the index, instead of a Query, is that the results are pre-computed and almost no additional operations are needed to satisfy the query, giving you instant results.

The difference of this scenario from the Indexing Map-Reduce scenario above is that the grouping is done on two fields, so we have double work to do, but the performance difference is only ~14% slower.

You can read more about the Map-Reduce indexes here.

The graph shows the number of documents mapped per second, highlighting that this scenario is disk-bound. Using an NVMe type disk significantly boosts performance by roughly 45% for a 2vCPU 16GB RAM machine and outperforms a 16vCPU 64GB RAM machine with an SSD disk by approximately 32%.

Cluster size

1 node

Data set

Stackoverflow Database

Data set size

1.01M documents – 562K Questions, 454K Users


                                        from q in docs.Questions
from tag in q.Tags
select new {
   Tag = tag,
   Count = 1,
   Month =
   q.CreationDate.ToString("yyyy-
   MM")
}

                                        from result in results
group result by new {
   Tag = result.Tag,
   Month = result.Month
} into g
select new {
   Month = g.Key.Month,
   Tag = g.Key.Tag,
   Count = Enumerable.Sum(g, x =>
      x.Count)
}
Provider
AWS
Platform
Linux
Instance size
2vCPU 8GB RAM
2vCPU 16GB RAM
4vCPU 16GB RAM
8vCPU 32GB RAM
16vCPU 64GB RAM
Disk types
SSD 3K IOPS, 125MB/s
SSD 10K IOPS, 160MB/s
NVME Nitro SSD 40K IOPS, 1250 MB/s