Inside RavenDB 4.0

Working with Indexes

We've spent the last three chapters examining how querying works in RavenDB, as well as what indexes are, how they operate and what they can do. We looked at everything from simple map indexes to spatial queries, from performing full text queries to aggregating large amounts of data using MapReduce. What we haven't talked about is how you'll work with indexes in a typical business application.

This chapter will focus on that, discussing how to create and manage indexes from the client side, how to perform queries and what options are available for us on the client API. We explored some basic queries in Chapter 4, but we only touched on queries briefly — just enough to get by while we learn more about RavenDB. Now we're going to dig deep and see everything we can do with indexes and queries in RavenDB.

Creating and managing indexes

RavenDB is schemaless. You can have documents in any shape, way or form that you like. However, indexes are one of the ways to bring back structure to such a system. An index will take the documents as input and then output the index entries in a fixed format. Queries on this index must use the fields defined on the index (unless the index is doing dynamic field generation), and there's typically a strong tie between the structure of the index, the output of queries and the client code using it.

That's interesting because it means that changing the index might cause client code to break, and that strongly brings to mind the usual issues you run into with a fixed schema. This often leads to complexities when developing and working with schemas because versioning, deploying and keeping them in sync with your code is a hassle.

RavenDB allows you to define your indexes directly in your code, which in turn allows you to version the indexes as a single unit with the rest of your system. In order to see how this works, we'll use a C# application. Open PowerShell and run the commands shown in Listing 12.1.

Listing 12.1 Creating a new RavenDB project


dotnet new console -n Northwind
dotnet add .\Northwind\ package RavenDB.Client
dotnet restore .\Northwind\

The commands in Listing 12.1 just create a new console application and add the RavenDB client package to the project. Now, go to RavenDB and create a new database named Northwind. Go to Settings and then Create Sample Data and click the Create button. Click the View C# Classes link, copy the code to a file called Entities.cs and save it in the Northwind app folder.

We're now ready to start working with real indexes from the client side.

Working with indexes from the client

Before we get to defining new indexes, let's start with an easier step: querying on an existing index. Your Program.cs file should be similar to Listing 12.2.

Listing 12.2 This console application queries RavenDB for all London based employees


using System;
using System.Linq;
using Raven.Client.Documents;
using Orders;

namespace Northwind
{
    class Program
    {
        static void Main(string[] args)
        {
            var store = new DocumentStore
            {
                Urls = new []
                {
                    "http://localhost:8080"
                },
                Database = "Northwind"
            };
            store.Initialize();
      
            using (var session = store.OpenSession())
            {
                var londonEmployees = 
                    from emp in session.Query<Employee>()
                    where emp.Address.City == "London"
                    select emp;
      
                foreach (var emp in londonEmployees)
                {
                    Console.WriteLine(emp.FirstName);
                }
            }
        }
    }
}

The code in Listing 12.2 is the equivalent of "Hello World," but it will serve as our basic structure for the rest of this chapter.

The query we have in Listing 12.2 is a pretty simple dynamic query, the likes of which we already saw in Chapter 4. This is translated to the following RQL: FROM Employees WHERE Address.City = $p0. So far, there are no surprises, and if you check the indexes on the database, you should find that the Auto/Employees/ByAddress.City index was automatically created to satisfy the query. How can we select the index we want to use for a query from the client side? You can see the answer in Listing 12.3.

Listing 12.3 Specifying the index to use for a query (using strings)


var ordersForEmployee1A = 
    from order in session.Query<Order>("Orders/Totals")
    where order.Employee == "employees/1-A"
    select order;

foreach (var order in ordersForEmployee1A)
{
    Console.WriteLine(order.Id);
}

As you can see, the major difference is that we're now querying on the Orders/Totals index and we pass that as a string to the Query method. Using this method means that we need to define the index somewhere, which leads to the deployment and versioning issues that I already discussed. RavenDB has a better solution.

Defining simple indexes via client code

When using a strongly typed language, we can often do better than just passing strings. We can use the features of the language itself to provide a strongly typed answer for that. We'll recreate the Orders/Totals index in our C# code, as shown in Listing 12.4. (You'll need to add a using Raven.Client.Documents.Indexes; to the file.)

Listing 12.4 The index is defined using a strongly typed class


public class My_Orders_Totals : AbstractIndexCreationTask<Order>
{
    public My_Orders_Totals()
    {
        Map = orders =>
          from o in orders  
          select new
          {
              o.Employee,
              o.Company,
              Total = o.Lines.Sum(l =>
                  (l.Quantity * l.PricePerUnit) * (1 - l.Discount))
          };
    }
}

We use My/Orders/Totals as the index name in Listing 12.4 to avoid overwriting the existing index. This way, we can compare the new index to the existing one. There are a few interesting features shown in Listing 12.4. First, we have a class definition that inherits from AbstractIndexCreationTask<T>. This is how we let RavenDB know this is actually an index definition and what type it will be working on.

The generic parameter for the My_Orders_Totals class is quite important. That's the source collection for this index. In the class constructor, we set the Map property to a Linq expression, transforming the documents into the index entries. The orders variable is of type IEnumerable<Order>, using the same generic parameter as was passed to the index class. Now we just need to actually create this index. There are two ways of doing that. Both are shown in Listing 12.5.

Listing 12.5 Creating indexes from the client side


// create a single index
new My_Orders_Totals().Execute(store);

// scan the assembly and create all the indexes in 
// the assembly as a single operation
var indexesAssembly = typeof(My_Orders_Totals).Assembly;
IndexCreation.CreateIndexes(indexesAssembly, store);

The first option in Listing 12.5 shows how we can create a single index. The second tells RavenDB to scan the assembly provided and create all the indexes defined there.

Automatically creating indexes

The IndexCreation.CreateIndexes option is a good way to avoid managing indexes manually. You can stick this call somewhere in your application's startup during development and as an admin action in production. This way, you can muck about with the index definitions as you wish, and they'll always match what the code is expecting.

In other words, you can check out your code and run the application, and the appropriate indexes for this version of the code will be there for you, without you really having to think about it. For production, you might want to avoid automatic index creation on application startup and put that behind an admin screen or something similar. But you'll still have the option of ensuring the expected indexes are actually there. This makes deployments much easier because you don't have to manage the "schema" outside of your code.

After running the code in Listing 12.5, you'll see that there is an index named My/Orders/Totals in the database. By convention, we replace _ with / in the index name. Now is the time to try to query this index, in a strongly typed manner, as you can see in Listing 12.6.

Listing 12.6 Specifying the index to use for a query (strongly typed)


var ordersForEmployee1A = 
    from order in session.Query<Order, My_Orders_Totals>()
    where order.Employee == "employees/1-A"
    select order;

The second generic parameter to Query is the index we want to use, and the first one is the item we're querying on. Note that in this case, what we query on and what we're getting back is the same thing, so we can use Order as both the item we query on and the return type. But that isn't always the case.

Working with complex indexes using strongly typed code

As we've seen in previous chapters, there isn't any required correlation between the shape of the document being indexed and the output of the index entry. In fact, there can't be if we want to support dynamic data and schemaless documents. That means that when we're talking about indexing, we're actually talking about several models that are usually either the same or very similar, but they don't have to be.

There are the following models to consider:

  • The documents to be indexed.
  • The index entry that was outputted from the index.
  • The actual queryable fields in the index.
  • The result of the query.

Consider the case of the following query: from Orders where ShipTo.City = 'London'. In this case, all four models behave as if we're querying on the Orders collection directly. But even in such a simple scenario, that isn't the case.

The documents to be indexed are the documents in the Orders collection, but what is actually being indexed here? In the simplest case, it's an index entry such as {"ShipTo.City": "London", "@id": "orders/42-A"}. When we query, we actually try to find a match for ShipTo.City = 'London', and from there we fetch the document and return it.

Consider the query in Listing 12.7, on the other hand, which adds a couple of interesting wrinkles.

Listing 12.7 Using projections and query on array to show the difference between various models


from Orders as o
where o.Lines[].Product == "products/3-A"
select {
    Company: o.Company,
    Quantity: o.Lines
        .reduce((sum, l) => sum + l.Quantity, 0)
}

The Lines[].Product is a field that's used differently during indexing and querying. In the index entry generated from the documents, the Lines[].Product is an array. But during queries, we use it in an equality comparison as if it was a normal value. This is because the array in the index entry was flattened to allow us to query any of the values on it.

The shape of the results of the query in Listing 12.7 is very different than the shape of the documents. That's because of the projection in the select. As long as we're working with RQL directly, we don't really notice, but how do we deal with such different shapes on the client side?

When using a strongly typed language such as C#, for example, we need some way to convey the differences. We can do that using explicit and implicit types. Consider the index My/Orders/Totals that we defined in Listing 12.4. Look at the Total field that we computed in the index. How are we going to be able to query on that?

We need to introduce a type, just for querying, to satisfy the compiler. An example of such a query is shown in Listing 12.8.

Listing 12.8 Using a dedicate type for strongly typed queries


public class My_Orders_Totals : 
    AbstractIndexCreationTask<Order, My_Orders_Totals.Result>
{
    public class Result
    {
        public string Employee;
        public string Company;
        public double Total;
    }

    // class constructor shown in Listing 12.4
}

var bigOrdersForEmployee1A = 
(
    from  o in session.Query<My_Orders_Totals.Result, My_Orders_Totals>()
    where o.Employee == "employees/1-A" && 
          o.Total > 1000
    select o
).OfType<Order>().ToList();

The code in Listing 12.8 shows a common usage pattern in RavenDB. First, we define a nested type inside the index class to represent the result of the index. This is commonly called Result, IndexEntry or Entry. There's no real requirement for this to be a nested class, by the way. It can be any type that simply has the required fields. The idea here is that we just need the compiler to be happy with us.

The problem with using the My_Orders_Totals.Result class is that, while we can now use it in the where clause, we aren't actually going to get this class in the results. We'll get the full Order document. We can tell the compiler that we'll be getting a list of Order by calling OfType<Order>(). This is a client-side-only behavior, which only converts the type being used in the query and has no effect on the server-side query that will be generated.

Calling OfType doesn't close the query. We can still continue to add behavior to the query to project the relevant data or to select the sort order for the results, as you can see in Listing 12.9.

Listing 12.9 Adding projection and sorting after calling OfType


var bigOrdersForEmployee1A = 
(
    from  o in session.Query<My_Orders_Totals.Result, My_Orders_Totals>()
    where o.Employee == "employees/1-A" && 
          o.Total > 1000
    orderby o.Total descending
    select o
).OfType<Order>();

var results = 
    from o in bigOrdersForEmployee1A
    orderby o.Employee
    select new 
    {
        o.Company,
        Total = o.Lines.Sum(x => x.Quantity)
    };

The RQL generated by the query in Listing 12.9 is shown in Listing 12.10.

Listing 12.10 The RQL generated by the query in Listing 12.9


from index 'My/Orders/Totals' as o
where o.Employee = $p0 and o.Total > $p1 
order by Total as double desc, Employee 
select { 
    Company : o.Company, 
    Total : o.Lines.map(function(x) { return x.Quantity; })
        .reduce(function(a, b) { return a + b; }, 0) 
}

As you can see, even though Listing 12.9 has the orderby clauses in different locations and operating on different types, the RQL generated doesn't care about that and has the proper sorting.

The last part is important. It's easy to get caught up with the code you have sitting in front of you while forgetting that, underneath it all, what's sent to the server is RQL. In many respects, we torture the type system in these cases to get it to both agree on the right types and to allow us to generate the right queries to the server.

Listing 12.4 shows how we can create a simple index from the client. But we're still missing a few bits. This kind of approach only lets us create very simple indexes. How are we going to handle the creation of a MapReduce index?

Defining MapReduce indexes via client code

On the client side, a MapReduce index is very similar to the simple indexes that we've already seen. The only difference is that we have an issue with the strongly typed nature of the language. In Listing 12.4, we defined index My_Orders_Totals and used a generic parameter to indicate that the source collection (and the type this index is operating on) is Order.

However, with a MapReduce index, we have two types. One is the type that the Map will operate on, just the same as we had before. But there's another type, which is the type that the Reduce is going to work on. As you probably expected, we'll also pass the second type as a generic argument to the index. Listing 12.11 shows such a MapReduce index using strongly typed code.

Listing 12.11 Defining a map-reduce index from code


public class My_Products_Sales : 
    AbstractIndexCreationTask<Order, My_Products_Sales.Result>
{
    public class Result
    {
        public string Product;
        public int Count;
        public double Total;
    }

    public My_Products_Sales()
    {
        Map = orders =>
            from order in orders
            from line in order.Lines
            select new
            {
                Product = line.Product,
                Count = 1,
                Total = (line.Quantity * line.PricePerUnit)
            };

        Reduce = results =>
            from result in results
            group result by result.Product
            into g
            select new
            {
                Product = g.Key,
                Count = g.Sum(x => x.Count),
                Total = g.Sum(x => x.Total)
            }
    }
}

The index My_Products_Sales in Listing 12.11 defined the Map as we have previously seen in the My_Orders_Totals index. We also have another nested class called Result. (Again, using a nested class is a mere convention because it keeps the Result class near the index using it). However, we're also using the nested class for this type as the generic argument for the base type.

This might look strange at first, but it's actually quite a natural way to specify a few things: first, that this index's Map's source collection is Order, and second, that the output of the Map and the input (and output) of the Reduce are in the shape of the Result class. Note that I'm using the phrase "in the shape of" and not "of type." This is because, as you can see in the select new clauses, we aren't actually returning the types there. We're returning an anonymous type.

As long as the shape matches (and the server will verify that), RavenDB doesn't care. The actual execution of the index is done on the server side and is not subject to any of the type rules that you saw in the code listing so far. It's important to remember that the purpose of all the index classes and Linq queries is to generate code that will be sent to the server. And as long as the server understands what's expected, it doesn't matter what's actually being sent.

You can create the index in Listing 12.11 using either new My_Products_Sales().Execute(store) or by running the IndexCreation.CreateIndexes(assembly, store) again. Go ahead and inspect the new index in the Studio. Remember, the index name in the RavenDB Studio is My/Products/Sales.

With the index on the server, we can now see how we can query a MapReduce index from the client side. This turns out to be pretty much the same as we've already seen. Listing 12.12 has the full details.

Listing 12.12 Querying a map-reduce index using Linq


var salesSummaryForProduct1A = 
    from s in session.Query<My_Products_Sales.Result, My_Products_Sales>()
    where s.Product == "products/1-A"
    select s;

The Query method in Listing 12.12 takes two generic parameters. The first is the type of the query — in this case, the Result class, which we also used in the index itself as the input (and output) of the Reduce function. The second generic parameter is the index that we'll use. In the case of the query in Listing 12.12, the output of the query is also the value emitted by the MapReduce index, so we don't have to play any more games with types.

Strong types and weak lies

RavenDB goes to great lengths to pretend that it actually cares about types when you're writing code in a strongly typed language. The idea is that from the client code, you'll gain the benefits of strongly typed languages, including IntelliSense, compiler checks of types, etc.

That isn't what's actually being sent to the server, though. And while the vast majority of the cases are covered with a strongly typed API, there are some things that either cannot be done or are awkward to do. For such scenarios, you can drop down a level in the API and use the string-based APIs that give you maximum flexibility.

We've seen how to create simple indexes and MapReduce indexes, but we also have multimap indexes in RavenDB. How are we going to work with those from the client side?

Multimap indexes from the client

An index can have any number of Map functions defined on it, but the code we explored so far in Listing 12.4 and Listing 12.11 only shows us how to define a single map. This is because the AbstractIndexCreationTask<T> base class is meant for the common case where you have only a single Map in your index. If you want to define a multimap index from the client, you need to use the appropriate base class, AbstractMultiMapIndexCreationTask<T>, as you can see in Listing 12.13.

Listing 12.13 Defining a multimap index from the client side


public class People_Search : 
    AbstractMultiMapIndexCreationTask<People_Search.Result>
{
    public class Result
    {
        public string Name;
    }

    public People_Search()
    {
        AddMap<Employee>(employees =>
            from e in employees
            select new
            {
                Name = e.FirstName + " " + e.LastName
            }
        );
        AddMap<Company>(companies =>
            from c in companies
            select new
            {
                c.Contact.Name
            }
        );
        AddMap<Supplier>(suppliers =>
            from s in suppliers
            select new
            {
                s.Contact.Name
            }
        );
    }
}

The index in Listing 12.13 is using multimap to index Employees, Companies and Suppliers. We already ran into this index before, in Listing 10.12. At the time, I commented that dealing with a heterogeneous result set can be challenging — not for RavenDB or the client API, but for your code.

You can see that, in Listing 12.13, we also have a Result class that's used as a generic parameter. Technically, since we don't have a Reduce function in this index, we don't actually need it. But it's useful to have because the shape the index entry will take is explicit. We call AddMap<T> for each collection that we want to index, and all of the AddMap<T> calls must have the output in the same shape.

How about actually using such an index? Before we look at the client code, let's first consider a use case for this. The index allows me to query across multiple collections and fetch results from any of the matches. Consider the case of querying for all the results where the name starts with Mar. You can see a mockup of how this will look in the UI in Figure 12.1.

Figure 12.1 Getting the UI to display heterogeneous results from the People/Search index

Getting the UI to display heterogeneous results from the People/Search index

To query this successfully from the client, we need to specify both the type of the index and the shape we're querying on. Luckily for us, we already defined that shape: the People_Search.Result nested class. You can see the query in Listing 12.14.

Listing 12.14 Querying a multimap index with heterogeneous results from the client


var results = session.Query<People_Search.Result, People_Search>()
    .Where(item => item.Name.StartsWith("Mar"))
    .OfType<object>();

foreach (var result in results)
{
    switch(result)
    {
        case Employee e:
            RenderEmployee(e);
            break;
        case Supplier s:
            RenderSupplier(s);
            break;
        case Company c:
            RenderCompany(c);
            break;
    }
}

In Listing 12.14, we're issuing the query on results in the shape of People_Search.Result and then telling the compiler that the result can be of any type. If we had a shared interface or base class, we could have used that as the common type for the query. The rest of the code just does an in-memory type check and routes each result to the relevant rendering code.

Linq isn't the only game in town

The RavenDB query API is built in layers. At the top of the stack, you have Linq, which gives you strongly typed queries with full support from the compiler. Below Linq, you have the DocumentQuery API, which is a bit lower level and gives the user a programmatic way to build queries.

You can access the DocumentQuery API through session.Advanced.DocumentQuery<T>, as shown in the following query:

var results = session.Advanced
      .DocumentQuery<object>("People/Search")
  	.WhereStartsWith("Name", "Mar")
  	.ToList();

This query is functionally identical to the one in Listing 12.14, except that we're weakly typed here. This kind of API is meant for programmatically building queries, working with users' input, and the like. It's often easier to build such scenarios without the constraints of the type system. The DocumentQuery API is capable of any query that Linq can perform. Indeed, since Linq is implemented on top of DocumentQuery, that's fairly obvious.

You can read more about the options available to you with DocumentQuery in the online documentation.

We could have also projected fields from the index and gotten all the results in the same shape from the server. Writing such a query using C# is possible, but it's awkward and full of type trickery. In most cases, it's better to use RQL directly for such a scenario.

Using RQL from the client

Within RavenDB, RQL queries give you the most flexibility and power. Any other API ends up being translated to RQL, after all. Features such as Language Integrated Query make most queries a joy to build, and the DocumentQuery API gives us much better control over programmatically building queries. But at some point, you'll want to just write raw RQL and get things done.

There are several levels at which you can use RQL from your code. You can just write the full query in RQL, you can add RQL snippets to a query and you can define the projection manually. Let's look at each of these in turn. All of the queries we'll use in this section will use the SearchResult class defined in Listing 12.15.

Listing 12.15 A simple data class to hold the results of queries


public class SearchResult
{
    public string ContactName;
    public string Collection;
}

Listing 12.16 shows how we can work directly with RQL. This is very similar to the query we used in Listing 10.13 a few chapters ago.

Listing 12.16 Querying using raw RQL


List<SearchResult> results = session.Advanced
    .RawQuery<SearchResults>(@"
        from index 'People/Search' as p
        where StartsWith(Name, $name)
        select
        {
            Collection: p['@metadata']['@collection'],
            ContactName: (
                p.Contact || { Name: p.FirstName + ' ' + p.LastName }
            ).Name
        }
    ")
  .AddParameter("$name", "Mar")
  .ToList();

There are a few items of interest in the query in Listing 12.16. First, you can see that we specify the entire query using a single string. The generic parameter that's used in the RawQuery method is the type of the results for the query. This is because we're actually specifying the query as a string, so we don't need to play hard to get with the type system and can just specify what we want in an upfront manner.

The query itself is something we've already encountered before. The only surprising part there is the projection that checks if there's a Contact property on the object or creates a new object for the Employees documents (which don't have this property).

Query parameters and RQL

In Listing 12.16, there's something that's both obvious and important to call out, and it's the use of query parameters. We use the $name parameter and add it to the query using the AddParameter method.

It's strongly recommended that you only use parameters and you don't build queries using string concatenation (especially when it involves users' input). If you need to dynamically build queries, using the DocumentQuery is preferred. And users' input should always be sent using AddParameter so it can be properly processed and not be part of the query.

See also SQL Injection Attacks in your favorite search engine.

Listing 12.16 required us to write the full query as a string, which means that it's opaque to the compiler. We don't have to go full bore with RQL strings; we can ask the RavenDB Linq provider to do most of the heavy lifting and just plug in our custom extension when it's needed.

Consider the code in Listing 12.17, which uses RavenQuery.Raw to carefully inject an RQL snippet into the Linq query.

Listing 12.17 Using Linq queries with a bit of RQL sprinkled in


List<SearchResult> results = 
    from item in session.Query<People_Search.Result, People_Search>()
    where item.Name.StartsWith("Mar")
    select new SearchResult 
    {
        Collection = RavenQuery.Raw("item['@metadata']['@collection']"),
        ContactName = RavenQuery.Raw(@"(
              item.Contact || { Name: item.FirstName + ' ' + item.LastName }
          ).Name")
    }

Listing 12.17 isn't a representative example, mostly because it's probably easier to write it as a RQL query directly. But it serves as a good example of a non-trivial query and how you can utilize advanced techniques in your queries.

It's more likely that you'll want to use a variant of this technique when using the DocumentQuery API. This is because you'll typically compose queries programmatically using this API and then want to do complex projections from the query. This is easy to do, as you can see in Listing 12.18.

Listing 12.18 Using a custom projection with the 'DocumentQuery' API


List<SearchResult> results = 
    session.Advanced.DocumentQuery<People_Search.Result, People_Search>()
    .WhereStartsWith(x => x.Name, "Mar")
    .SelectFields<SearchResult>(QueryData.CustomFunction(
        alias: "item",
        func: @"{
            Collection: item['@metadata']['@collection'],
            ContactName: (
                p.Contact || { Name: item.FirstName + ' ' + item.LastName }
            ).Name
        }")
    ).ToList();                   

The queries in Listing 12.16, 12.17 and 12.18 produce the exact same query and the same results, so it's your choice when to use either option. Myself, I tend to use RQL for complex queries where I need the full power of RQL behind me and when I can't express the query that I want to write in a natural manner using Linq.

I use the DocumentQuery API mostly when I want to build queries programmatically, such as search pages or queries that are composed dynamically.

Controlling advanced indexing options from the client side

In the previous section, we explored a lot of ways to project data from the People/Search index, but our query was a simple StartsWith(Name, $name). So if $name is equal to "Mar", we'll find an employee named Margaret Peacock. However, what would happen if we tried to search for "Pea"?

If you try it, you'll find there are no results. You can check the index's terms to explore why this is the case, as shown in Figure 12.2.

Figure 12.2 The terms list for the People/Search index near Margaret Peacock

The terms list for the People/Search index near Margaret Peacock

When you look at the terms in Figure 12.2, it's obvious why we haven't been able to find anything when searching for the "Pea" prefix. There's no term that starts with it. Our index is simply indexing the terms as is, with minimal work done on them. (It's just lowercasing them so we can run a case-insensitive search.)

We already looked at this in Chapter 10, in the section about full text indexes, so this shouldn't come as a great surprise. We need to mark the Name field as a full text search field. But how can we do that from the client side? Listing 12.19 shows the way to do it.

Listing 12.19 Configuring index fields options via code


public class People_Search : 
  AbstractMultiMapIndexCreationTask<People_Search.Result>
{
    public People_Search()
    {
        // AddMap calls from Listing 12.13
        // removed for brevity

        Index(x => x.Name, FieldIndexing.Search);
        Suggestion(x => x.Name);
    }
} 

In Listing 12.19, you can see the Index method. It configures the indexing option for the Name field to full text search mode. And the Suggestion method is used, unsurprisingly enough, to indicate that this field should have suggestions applied to it.

Creating weakly typed indexes

In addition to the strongly typed API exposed by AbstractMultiMapIndexCreationTask and AbstractIndexCreationTask you can also use the weakly typed API to control every aspect of the index creation, such as with the following code:

public class People_Search : AbstractIndexCreationTask
{
    public override IndexDefinition CreateIndexDefinition()
    {
        return new IndexDefinition()
        {
            Maps = { 
                @"from e in docs.Employees select new {
                    Name = e.FirstName + ' ' + e.LastName
                }",
                @"from c in docs.Companies select new {
                    c.Contact.Name
                }",
                @"from s in docs.Suppliers select new {
                    s.Contact.Name
                }" 
            },
            Fields = { 
                ["Name"] = new IndexFieldOptions {
                      Indexing = FieldIndexing.Search
                } 
            }
        };
    }
}

You're probably sick of the People/Search index by now, with all its permutations. The index definition above behaves just the same as all the other People/Search indexes we looked at, including being picked up by IndexCreation automatically. It just gives us the maximum amount of flexibility in all aspects of the index.

There are other options available, such as using Store to store the fields, Spatial for geographical indexing and a few more advanced options that you can read more about in the online documentation. Anything that can be configured through the Studio can also be configured from code.

MultimapReduce indexes from the client

The last task we have to do with building indexes from client code is to build a MultimapReduce index. This is pretty straightforward, given what we've done so far. We need to define an index class inheriting from AbstractMultiMapIndexCreationTask, define the Maps using the AddMap methods and finally define the Reduce function. Listing 12.20 shows how this is done.

Listing 12.20 MultimapReduce index to compute details about each city


public class Cities_Details : 
  AbstractMultiMapIndexCreationTask<Cities_Details.Result>
{
    public class Result
    {
        public string City;
        public int Companies, Employees, Suppliers;
    }
  
    public Cities_Details()
    {
        AddMap<Employee>(emps => 
            from e in emps
            select new Result
            {
                City = e.Address.City,
                Companies = 0,
                Suppliers = 0,
                Employees = 1
            }
        );
    
        AddMap<Company>(companies => 
            from c in companies
            select new Result
            {
                City = c.Address.City,
                Companies = 1,
                Suppliers = 0,
                Employees = 0
            }
        );
    
        AddMap<Suppplier>(suppliers => 
            from s in suppliers
            select new Result
            {
                City = s.Address.City,
                Companies = 0,
                Suppliers = 1,
                Employees = 0
            }
        );

    Reduce = results =>
        from result in results
        group result by result.City
        into g
        select new Result
        {
            City = g.Key,
            Companies = g.Sum(x => x.Companies),
            Suppliers = g.Sum(x => x.Suppliers),
            Employees = g.Sum(x => x.Employees)
        }
    }
}

Listing 12.20 is a bit long, but it matches up to the index we defined in the previous chapter, in Listing 11.13. And the only new thing in the Cities_Details index class is the use of select new Result instead of using select new to create an anonymous class. This can be helpful when you want to ensure that all the Maps and the Reduce are using the same output. RavenDB strips the Result class when it creates the index, so the server doesn't care about it. This is simply here to make our lives easier.

Deploying indexes

I briefly mentioned earlier that it's typical to deploy indexes using IndexCreation.CreateIndexes or its async equivalent IndexCreation.CreateIndexesAsync. These methods take an assembly and the document store you're using and scan the assembly for all the index classes. Then they create all the indexes they found in the database.

During development, it's often best to call one of these methods in your application startup. This way, you can modify an index and run the application, and the index definition is automatically updated for you. It also works great when you pull changes from another developer. You don't have to do anything to get the right environment set up.

Attempting to create an index that already exists on the server (same name and index definition) is ignored and has no effect on the server or the cluster. So if nothing changed in your indexes, the entire IndexCreation.CreateIndexes call does nothing at all. Only when there are changes to the indexes will it actually take effect.

Locking indexes

Sometimes you need to make a change to your index definition directly on your server. That's possible, of course, but you have to be aware that if you're using IndexCreation to automatically generate your indexes, the next time your application starts, it will reset the index definition to the original.

That can be somewhat annoying because changing the index definition on the server can be a hotfix to solve a problem or introduce a new behavior, and the index reset will just make it go away, apparently randomly.

In order to handle this, RavenDB allows the option of locking an index. An index can be unlocked, locked or locked (error). In the unlocked mode, any change to the index would be accepted, and if the new index definition is different than the one stored on the server, the index would be updated and re-index the data using the new index definition. In the locked mode, creating a new index definition would return successfully but would not actually change anything on the server. And in the locked (error) mode, trying to change the index will raise an error.

Usually you'll just mark the index as locked, which will make the server ignore any changes to the index. The idea is that we don't want to break your calls to IndexCreation by throwing an error.

Note that this is not a security measure. It's a way for the operations team to make a change in the index and prevent the application from mindlessly setting it back. Any user that can create an index can also modify the lock mode on the index.

Index creation is a cluster operation, which means that you can run the command against any node in the database group and RavenDB will make sure that it's created in all the database's nodes. The same also applies for automatic indexes. If the query optimizer decides that a query requires an index to be created, that index is going to be created in all the database instances, not just the one that processed this query.

This means that the experience of each node can be shared among all of them and you don't have to worry about a failover from a node that has already created the indexes you're using to one that didn't accept any queries yet. All of the nodes in a database group will have the same indexes.

Failure modes and external replication

Being a cluster operation means that index creation is reliable; it goes through the Raft protocol and a majority of the nodes must agree to accept it before it's acknowledged to the client. If, however, a majority of the nodes in the cluster are not reachable, the index creation will fail. This applies to both manual and automatic index creation in the case of network partition or majority failure. Index creation is rare, though, so even if there's a failure of this magnitude, it will not typically affect day-to-day operations.

External replication allows us to replicate data (documents, attachments, revisions, etc.) to another node that may or may not be in the same cluster. This is often used as a separate hot spare, offsite backup, etc. It's important to remember that external replication does not replicate indexes. Indexes are only sent as a cluster operation for the database group. This allows you to have the data replicated to different databases and potentially run different indexes on the documents.

There are other considerations to deploying indexes, especially in production. In the next section, we'll explore another side of indexing: how indexes actually work.

How do indexes do their work?

This section is the equivalent of popping the hood on a car and examining the engine. For the most part, you shouldn't have to do that, but it can be helpful to understand what is actually going on.

An index in RavenDB is composed of

  • Index definition and configuration options (Maps and Reduce, fields, spatial, full text, etc.).
  • Data on disk (where we store the results of the indexing operation).
  • Various caches for portions of the data, to make it faster to process queries.
  • A dedicated index thread that does all the work for the index.

What's probably the most important from a user perspective is to understand how this all plays together. An index in RavenDB has a dedicated thread for all indexing work. This allows us to isolate any work being done to this thread and give the admin better accountability and control. In the Studio, you can go to Manage Server and then click Advanced and you'll see the Threads Runtime Info. You can see a sample of that in Figure 12.3.

Figure 12.3 Showing the details on the Orders/ByCompany index's thread

Showing the details on the Orders/ByCompany index's thread

In Figure 12.3, you can see how much processing time is taken by the Orders/ByCompany indexing thread.

A dedicated thread per instance greatly simplifies operational behaviors and allows us to apply several important optimizations. It means that no index can interfere with any other index. A slow index can only affect itself, instead of having a global effect on the system. It simplifies the code and algorithms required for indexing because there's no need to write thread safe code.

This design decision also allows RavenDB to prioritize tasks more easily. RavenDB uses thread priorities at the operating system level to hint what should be done first. Setting the index priority will affect the indexing thread priority at the operating system level. You can see how to change the index priority in Figure 12.4.

Figure 12.4 Changing the indexing priority will update the indexing thread priority

Changing the indexing priority will update the indexing thread priority

By default, RavenDB prioritizes request processing over indexing, so indexing threads start with a lower priority than request-processing threads. That means requests complete faster and indexes get to run when there's capacity for them (with the usual starvation prevention mechanisms). You can increase or lower the index priority and RavenDB will update the indexing thread accordingly.

RavenDB also uses this to set the I/O priority for requests generated by indexing. In this way, we can be sure that indexing will not overwhelm the system by taking too much CPU or saturating the I/O bandwidth we have available.

The last point is important because RavenDB's indexes are always built online, in conjunction with normal operations on the server. And this request I/O priority scheme applies to both indexing creation and updates with each change to the data. We don't make distinctions between the two modes.

What keeps the indexing thread up at night?

When you create a new index, RavenDB will spawn a thread that will start indexing all the documents covered by the index. It will go through the documents in batches, trying to index as many of them in one go as it can, until all are indexed. What happens then?

The index will then go to sleep, waiting for a new or updated document in one of the collections that this index cares about. In other words, until there's such a document, the thread is not going to be runnable. It isn't going to compete for CPU time and takes very few resources from the system.

If the indexing thread detects that it's been idle for a while, it will actively work to release any resources it currently holds and then go back to sleep until it's woken by a new document showing up.

Indexing in batches

RavenDB typically needs to balance throughput vs. freshness when it comes to indexing. The bigger the batch, the faster documents get indexed. But we only see the updates to the index when we complete the batch. During initial creation, RavenDB typically favors bigger batches (as much as the available resources allow) and will attempt to index as many documents as it can at once.

After the index completes indexing all the documents it covers, it will watch for any new or updated documents and index them as soon as possible, without waiting for more updates to come. The typical indexing latency (the time between when a document updates and when the index has committed the batch including this document) is measured in milliseconds on most systems.

The query optimizer's capability to create new indexes on the fly depends on making sure the new index isn't breaking things while it's being built. Because of this requirement, RavenDB is very careful about resource allocations to indexing. We talked about CPU and I/O priorities, but there's also a memory budget applied. All in all, this has been tested in production for many years and has proven to be an extremely valuable feature.

The ability to deploy, in production, a new index (or a set of indexes) is key for operational agility. Otherwise, you'll have to schedule downtime whenever your application changes even the most minor of queries. This kind of flexibility is offered not just for new indexes but also for when you're updating existing ones.

Side by side

During development, you'll likely not notice the indexing update times. The amount of data you have in a development database is typically quite small, and the machine is not usually too busy in handling production traffic. In production, the opposite is true. There's a lot of data, and your machines are already busy doing their normal routine. This means that an index deploy can take a while.

Let's assume we have an index deploy duration (from the time it's created to the time it's done indexing all the relevant documents) of five minutes. An updated index definition can't just pick up from where the old index definition left off. For example, we might have added new fields to the index, so in addition to indexing new documents, we need to re-index all the documents that are already indexed. But if we have a five-minute period in which we're busy indexing, what will happen to queries made to the index during that time frame?

All index updates in RavenDB are done using the side-by-side strategy. Go to the Studio and update the Orders/Totals index by changing the Total field computation and save the document. Then immediately go to the indexes page. You should see something similar to what's shown in Figure 12.5.

Figure 12.5 Updating an index keeps the old definition alive until the new index is caught up.

Updating an index keeps the old definition alive until the new index is caught up.

Figure 12.5 shows an index midway through an update. But instead of deleting the old index and starting the indexing from scratch (which will impact queries), RavenDB keeps the old index around (for answering queries and indexing new documents) until the new version of the index has caught up and indexed everything.

This way, you can minimize the effects of updating an index in production. Once the updated version of the index has completed its work, it will automatically replace the old version. Of course, you can also force an immediate replacement if you really need to. (Swap now will do it.)

Auto indexes and the query optimizer

We talked about the query optimizer creating indexes on the fly several times, but now I want to shine a light on the kind of heuristics that the query optimizer uses and the logic that guides it.

At the most basic level, the query optimizer analyzes all the queries that don't specify an explicit index to use (anything that doesn't start with from index ... is fair game for the query optimizer). The query optimizer will attempt to find an index that can answer the query being asked, but if it fails to find any appropriate indexes, it will go ahead and create a new one.

One very important aspect is that the query optimizer isn't going to create an index blindly. Instead of only considering the current query when it's time to create a new index, the query optimizer is also going to weigh the history of the queries that were made against the database.

In other words, the logic that guides the query analyzer looks something like this:

  1. Is there an index that can match this query? If so, use that.
  2. If there's no such index, we need to create one.
  3. Let's take a look at all the queries that were made against the same collection as the one that's now being queried and see what would be the optimal index to answer all of these queries, including the new query.
  4. We need to create this new optimal index and wait for it to complete indexing.
  5. We should retire all the automatic indexes that have been created so far that are now covered by the new index.

The idea here is that RavenDB uses your queries as a learning opportunity to figure out more about the operational environment, and the query optimizer is able to use that knowledge when it needs to create a new index.

Over time, this means that we'll generate the optimal set of indexes to answer any query that doesn't use an explicit index. Furthermore, it means that operational changes, such as deploying a new version of your application with slightly different queries, will be met with equanimity by RavenDB. The query optimizer will recognize the new queries and figure out if they can use the existing indexes. If they can't, the optimizer will create a new index for them. All existing queries will continue to use the existing indexes until the new indexes are ready. Then they'll switch.

All of this will be done for you, without anyone needing to tell RavenDB what needs to be done or babysit it. The fact that index modifications are cluster-wide also means that all the nodes in the cluster will be able to benefit from this knowledge.

Importing and exporting indexes

RavenDB's ability to learn as it goes is valuable, but even so, you don't always want to do that kind of operation directly in production. If you have a large amount of data, you don't want to wait until you already deployed your application to production for RavenDB to start learning about the kind of queries that it's going to generate. During the learning process, there might be several paths taken that you want to skip.

You can run your application in a test environment, running a set of load tests and making the application issue all its queries to your test RavenDB instance. That instance will apply the same logic and create the optimal set of indexes to answer the kind of queries it saw.

You can now export that knowledge from the test machine and import it into the production cluster. The new indexes will be built, and by the time you're ready to actually deploy your application to production, all the work has already been done and the indexes are ready for the new queries in the updated version of your application.

Let's see how that can work, shall we? In the Studio, go to Settings and then to Export Database. Ensure that only the Include Indexes is selected and click the Export Database button. You can see what this looks like in Figure 12.6.

Figure 12.6 Exporting just the indexes from our database.

Exporting just the indexes from our database.

You can then take the resulting file and import that into the production instance (Settings and then Import Database) and the new indexes will be created. The query optimizer will then take them into account when it needs to decide which index is going to handle which query.

Indexing and querying performance

When it comes time to understand what's going on with your indexes, you won't face a black box. RavenDB tracks and externalizes a lot of information about the indexing processes and makes it available to you, mostly via the Studio in Indexes and then the Indexing Performance page. You can see a sample of what it looks like when the system is indexing in Figure 12.7.

Figure 12.7 Visually exploring details about the indexing actions (and their costs)

Visually exploring details about the indexing actions (and their costs)

The timeline view in Figure 12.7 shows several indexes running concurrently (and independently). (Solid colors mean the index batch is complete, stripes mean this is an actively executing index.) And you can hover over each of the steps to get more information, such as the number of documents indexed or the indexing rate, as shown in Figure 12.8.

This graph can be very useful for investigating what exactly is going on inside RavenDB without having to look through a pile of log files. For example, look at the thread details that we previously discussed (see Figure 12.3 for what this looks like in the Studio) and notice that a particular indexing thread is using a lot of CPU time.

You can go into the Indexing Performance window and simply look at what's taking so much time. For example, you may be using the "Suggestions" feature, which can be fairly compute-intensive with high update rates. An example of this is shown in Figure 12.8, where you can see the exact costs of suggestions during indexing.

Figure 12.8 Drilling down into particular operations (such as "Suggestions") can provide insight into what's costly

Drilling down into particular operations (such as "Suggestions") can provide insight into what's costly

Figure 12.8 shows a fairly simple example, but the kind of details exposed in the timeline can give you a better idea of what exactly is going on inside RavenDB. As part of ongoing efforts to be a database that's actively trying to help the operations team, RavenDB is externalizing all such decisions explicitly. I encourage you to look at each of these boxes. The tooltips reveal a lot of what's going on, and this kind of view should quickly give you a feeling about how much things should cost. That way, you can recognize when things are out of whack if you are exploring some issue.

Having a good idea of what's going on during indexing is just half the job. We also need to be able to monitor the other side: what's going on when we query the database. RavenDB actively monitors such actions and will bring it to the operator's attention when there are issues, as shown in Figure 12.9.

Figure 12.9 RavenDB will generate operational alerts for slow queries and very large results sets.

RavenDB will generate operational alerts for slow queries and very large results sets.

Figure 12.9 shows the large result set alert, generated when a query returns a very large number of results while not using streaming. (Streaming queries were discussed in Chapter 4.) This can lead to higher memory utilization on both client and server and is considered bad practice. RavenDB will alert you to this issue and provide the exact time and the query that caused it so you can fix the problem.

In the same vein, very slow queries are also made explicitly visible to the operators because they're something they probably need to investigate. There are other operational conditions that RavenDB monitors and will bring to your attention — anything from slow disk I/O to running out of disk space to network latency issues. We'll discuss alerts and monitoring in RavenDB in much more depth in the next part of the book, so I'll save it till then.

Error handling in indexing

Sometimes, your index runs into an error. RavenDB actually goes to great lengths to avoid that. Property access inside the index will propagate nulls transitively. In other words, you can write the index shown in Listing 12.21 and you won't get a NullReferenceException.

Listing 12.21 Accessing a null 'manager' instance will not throw an exception


public class Employees_Managers
  : AbstractIndexCreationTask<Employee>
{
    public Employees_Managers()
    {
        Map = emps =>
            from e in emps
            let manager = LoadDocument<Employee>(e.ReportsTo)
            select new
            {
                Name = e.FirstName + " " + e.LastName,
                HasManager = manager != null,
                Manager = manager.FirstName + " " + manager.LastName
            };
    }
}

The employees/2-A document has null as the value of ReportsTo. What do you think will happen when the index shown in Listing 12.21 is busy indexing this document? LoadDocument will return null (because the document ID it got was null) and the value of HasManager is going to be false because there's no manager for employees/2-A. However, just one line below, we access the manager instance, which we know is null.

Usually, such an operation will throw a NullReferencException. RavenDB, however, rewrites all references so they use null propagation. The actual mechanism by which this is done is a bit complex and out of scope for this topic, but you can imagine that RavenDB actually uses Manager = manager?.FirstName + " " + manager?.LastName everywhere. Did you notice the ?. usage? This means "if the value is null, return null; otherwise, access the property."

In this way, a whole class of common issues is simply averted. On the other hand, the index will contain a name for a manager for employees/2-A. It will be " " because the space is always concatenated with the values, and null concatenated with a string is the string.

Some kinds of errors don't really let us recover. Consider the code in Listing 12.22. The index itself isn't very interesting, but we have an int.Parse call there on the PostalCode property.

Listing 12.22 Parsing UK PostalCode as int will throw an exception


public class Employees_PostalCode
  : AbstractIndexCreationTask<Employee>
{
    public Employees_PostalCode()
    {
        Map = emps =>
            from e in emps
            select new
            {
                Name = e.FirstName + " " + e.LastName,
                Postal = int.Parse(e.Address.PostalCode)
            };
    }
}

The PostalCode property in the sample data set is numeric for employees from Seattle and alphanumeric for employees in London. This means that for about half of the documents in the relevant collection, this index is going to fail. Let's see how RavenDB behaves in such a case. Figure 12.10 shows how this looks in the Studio.

Figure 12.10 Indexing errors and an errored index in the Studio

Indexing errors and an errored index in the Studio

We can see that the index as a whole is marked as errored. We'll ignore that for the moment and focus on the Index Errors page. If you click on it, you'll see a list of the errors that happened during indexing. You can click on the eye icon to see the full details. In this case, the error is "Failed to execute mapping function on employees/5-A. Exception: System.FormatException: Input string was not in a correct format. ... System.Number.ParseInt32 ..."

There are two important details in that error message: we know what document caused this error and we know what this issue is. These details make it easy to figure out what the problem is. Indeed, looking at employees/5-A, we can see that the value of the PostalCode property is "SW1 8JR". It's not really something that int.Parse can deal with.

So the indexing errors give us enough information to figure out what happened. That's great. But what about the state of the index? Why is it marked as errored? The easiest way to answer that question is to query the index and see what kind of error RavenDB returns. Executing the following query from index 'Employees/PostalCode' will give us this error: "Index 'Employees/PostalCode' is marked as errored. Index Employees/PostalCode is invalid, out of 9 map attempts, 4 has failed. Error rate of 44.44% exceeds allowed 15% error rate."

Now things become much clearer. An index is allowed to fail processing only some documents. Because of the dynamic nature of documents in RavenDB, you may get such failures. However, allowing such failures to go unattended is dangerous. An error in indexing a document means that this particular document is not indexed. That may seem like a tautology, but it has important operational implications. If the document isn't indexed, you aren't going to see it in the results. It is "gone".

While the indexing error is intentionally very visible, if you're running in an unattended mode, which is common, it may be a while before your users' complaints of "I can't find that record" make you check the database. What would be worse is if you had some change in the application or behavior that caused all new documents to fail to index. Because of that, an index is only allowed a certain failure rate. We'll mark the entire index as errored when that happens.

An index in an error state cannot be queried and will return an immediate error (similar to the error text above) with an explanation of what's going on. With an explicit error, it's much easier to figure out what's wrong and then fix it.

Summary

We started this chapter by discussing index deployments, from the baseline of defining indexes using strongly typed classes to the ease of use of IndexCreation.CreateIndexes to create all those indexes on the database.

We re-implemented many features and scenarios that we already encountered, but this time we implemented them from the client's code perspective. Building indexes using Linq queries is an interesting experience. We started from simple indexes and MapReduce indexes with AbstractIndexCreationTask<T> and then moved to multimap and MultimapReduce indexes with AbstractMultiMapIndexCreationTask<T> base classes.

We explored how to query RavenDB from the client side, starting with the simplest of Linq queries and building toward more flexibility with some more complex queries. With both Linq queries and the strong typed indexes, we talked about the fact that RavenDB isn't actually aware of your client-side types, nor does it really care about them.

All the work done to make strongly typed indexes and queries on the client side is purely there so you'll have good compiler, IntelliSense and refactoring support inside your application. In the end, both queries and indexes are turned into RQL strings and sent to the server.

We looked at how we can directly control the IndexDefinition sent to the server, giving us absolute power to play with and modify any option that we wish. This can be done by using the non-generic AbstractIndexCreationTask class and implementing the CreateIndexDefinition() method.

In a similar sense, all the queries we run are just fancy ways to generate RQL queries. We looked into all sorts of different ways of using RQL queries in your applications, from using RQL directly by calling RawQuery (and remembering to pass parameters only through AddParameter) to poking holes in Linq queries using RavenQuery.Raw method to using a CustomFunction to take complete control over the projection when using DocumentQuery.

Following the discussion on managing the indexes, we looked into how indexes are deployed on the cluster (as a reliable cluster operation, with a majority consensus using Raft) and what this means (they're not available for external replication and they require a majority of the nodes to be reachable to create/modify an index).

We dived into the execution environment of an index and the dedicated thread that RavenDB assigns to it. Such a thread makes managing an index simpler because it gives us a scope for prioritizing CPU and I/O operations, as well as defines a memory budget to control how much RAM this index will use. This is one of the key ways that RavenDB is able to implement online index building. Being able to limit the amount of system resources that an index is using is crucial to ensure that we aren't overwhelming the system and hurting ongoing operations.

The process of updating an index definition got particular attention since this can be of critical importance in production systems. RavenDB updates indexes in a side-by-side manner. The old index is retained (and can even index new updates) while the new index is being built. Once the building process is done, the old index is removed in favor of the new one in an atomic fashion.

We briefly looked at the query optimizer, not so much to understand what it's doing but to understand what it means. The query optimizer routinely analyzes all queries and is able to create indexes on the fly, but the key aspect of that is that it uses that information to continuously optimize the set of indexes you have. After a while, the query optimizer will produce the optimal set of indexes for the queries your application generates.

You can even run a test instance of your application to teach a RavenDB node about the kind of queries it should expect and then export that knowledge to production ahead of your application deployment. In this way, RavenDB will prepare itself ahead of time for the new version and any changes in behavior that it might have.

We then moved to performance and monitoring. RavenDB exposes a lot of details about how it indexes documents in an easy-to-consume manner, using the Index Performance page, and it actively monitors queries for bad practices such as queries that return too many results or are very slow. The result of this level of monitoring is that the operations team is made aware that there are issues that they might want to take into account and resolve, even if they aren't currently critical.

We want to head things off as soon as possible, after all, and not wait until the sky has fallen to start figuring out that there were warning signs all along the way. At the same time, these alerts aren't going to spam your operations team. That kind of behavior builds tolerance to any kind of alerts because they effectively become noise.

We closed the chapter with a discussion of error handling. How does RavenDB handle indexing errors? How are they made visible to the operators, and what kind of behavior can you expect from the system? RavenDB will tolerate some level of errors from the index, but if there are too many indexing issues, it will decide that's not acceptable and mark the whole index as failing, resulting in any query against this index throwing an exception.

This chapter has marked the end of theory and high-level discussion and moved toward a more practical discussion on how to operate RavenDB. In the next part of the book, we're going to focus on exactly that: the care and feeding of a RavenDB cluster in production. In other words, operations, here we come.