RavenDB 5.0: Optimizing date range queries

I can a lot about the performance of RavenDB, a you might have noticed from this blog. A few years ago we had a major architecture shift that increased our performance by a factor of ten, which was awesome. But with the meal, you get appetite, and we are always looking for better performance.

One of the things we did with RavenDB is build things so we’ll have the seams in place to change the internal behavior without users noticing how things are working behind the scenes. We have used this capability a bunch of time to improve the performance of RavenDB. This post if about one such scenario that we recently implemented and will go into RavenDB 5.0.

Consider the following query:

As you can see, we are doing a range based query on a date field. Now, the source collection in this case has just over 5.9 million entries and there are a lot of unique elements in the specified range. Let’s consider how RavenDB will handle this query in version 4.2:

First, find all the unique CreatedAt values between those ranges (there can be tens to hundreds of thousands).
Then, for each one of those unique values, find all the match documents (usually, only one).

This is expensive and the problem almost always shows up when doing date range queries over non trivial ranges because that combine the two elements of many unique terms and very few results per term.

The general recommendation was to avoid running the query above and instead use:

This allows RavenDB to use a different method for range query, based on numeric values, not distinct string values. The performance different is huge.

But the second query is ugly and far less readable. I don’t like such a solution, even if it can serve as a temporary workaround. Because of that, we implemented a better system in RavenDB 5.0. Behind the scenes, RavenDB now translate the first query into the second one. You don’t have to do anything to make it happen (when migrating from 4.2 instances, you’ll need to reset the index to get this behavior). You just use dates as you would normally expect them to be used and RavenDB will do the right thing and optimize it for you.

To give you a sense of the different in performance, the query above on a data set of 5.9 million records will have the following performance:

RavenDB 4.2 – 7,523 ms
RavenDB 5.0 – 134 ms

As you can imagine, I’m really happy about this kind of performance boost.

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

RavenDB 5.0: Optimizing date range queries

Woah, already finished? 🤯

Related Articles

CollabTalk Podcast | Episode 123 with Oren Eini–Building a business with Open Source foundations

RavenDB’s storage engine: Voron–unlocking the secret

Certificates from the Ground Up

Watch Live Demo

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events