Managing RavenDB indexes in production, a DevOps guide
RavenDB has the ability to analyze your queries and generate the appropriate indexes for you automatically. This isn’t a feature you need to enable or a toggle to switch, it is just the way it works by default. For more advanced scenarios, you have the ability to write your own indexes to process your data in all sorts of interesting ways. Indexes in RavenDB are used for aggregation (map-reduce), full text search, spatial queries, background computation and much more. This post isn’t going to talk about what you can do with RavenDB’s indexes, however. I’m going to discuss how you’ll manage them.
There are several ways to create indexes in RavenDB, the one that we usually recommend is to create a class that will inherit from AbstractIndexCreationTask. If you are using C# or TypeScript, you can create strongly typed indexes that will be checked by the compiler for you. If you are using other clients (or JS indexes), you will have the index definition as constant strings inside a dedicated class. Once you have the indexes defined as part of your codebase, you can then create them using a single command: IndexCreation.CreationIndexes();
What I described so far is the mechanics of working with indexes. You can read all about them in the documentation. I want to talk about the implications of this design approach:
- Your indexes live in the same repository as your code. Whenever you checkout a branch, the index definitions you’ll use will always match the code that queries them.
- Your indexes are strongly typed and are checked by the compiler. I mentioned this earlier, but this is a huge advantage, worth mentioning twice.
- You can track changes on your indexes using traditional source control tools. That makes reviewing index changes just a standard part of the job, instead of something you need to do in addition.
During development, it’s standard to deploy your indexes whenever the application starts. This way, you can change your indexes, hit F5 and you are immediately working on the latest index definition without having to make any other actions.
For production, however, we don’t recommend taking this approach. Two versions of the application using different index definitions would “fight” to apply the “right” version of the index, causing version bounce, for example. RavenDB has features such as index locking, but those are to save you from a fall, not for day to day activity.
You should have a dedicated endpoint / tool that you can invoke that would deploy your indexes from your code to your RavenDB instances. The question is, what should that look like? Before I answer this question, I want to discuss another aspect of indexing in RavenDB: automatic indexing.
So far, we discussed static indexes, ones that you define in your code manually. But RavenDB also allows you to run queries without specifying which index they will use. At this point, the query optimizer will generate the right indexes for your needs. This is an excellent feature, but how does that play in production?
If you deploy a new version of your application, it will likely have new ways of querying the database. If you just push that to production blindly, RavenDB will adjust quickly enough, but it will still need to learn all the new ways you query your data. That can take some time, and will likely cause a higher load on the system. Instead of doing all the learning and adjusting in production, there are better ways to do so.
Run the new version of your system on QA / UAT instance and put it through its paces. The QA instance will have the newest static indexes and RavenDB will learn what sort of queries you are issuing and what indexes it needs to run. Once you have completed this work, you can export the indexes from the QA instance and import them into production. Let the new indexes run and process all their data, then you can push the new version of your application out. The production database is already aware of the new behavior and adjusted to it.
As a final note, RavenDB index deployment is idempotent. That means that you can deploy the same set of indexes twice, but it will not cause us to re-index. That reduces the operational overhead that you have to worry about.