see on GitHub

Map-Reduce indexes

Map-Reduce indexes allow to perform complex aggregation of data. The first stage, called the map, runs over documents and extracts portions of data according to the defined mapping function(s). Upon completion of the first phase, reduction is applied to the map results and the final outcome is produced.

The idea behind map-reduce indexing is that aggregation queries using such indexes are very cheap. The aggregation is performed only once and the results are stored inside the index. Once new data come into the database or existing documents are modified the map-reduce index will keep the aggregation results up-to-date. The aggregations are never done during querying to avoid expensive calculations that could result in severe performance degradation. When you make the query, RavenDB just returns the matching results directly from the index.

For a more in-depth look at how map reduce works, you can read this post: RavenDB 4.0 Unsung Heroes: Map/reduce.

Creating

When it comes to index creation, the only difference between simple indexes and the map-reduce ones is an additional reduce function defined in index definition. To deploy an index we need to create a definition and deploy it using one of the ways described in the creating and deploying article.

Example I - Count

Let's assume that we want to count the number of products for each category. To do it, we can create the following index using LoadDocument inside:

public class Products_ByCategory : AbstractIndexCreationTask<Product, Products_ByCategory.Result>
{
    public class Result
    {
        public string Category { get; set; }

        public int Count { get; set; }
    }

    public Products_ByCategory()
    {
        Map = products => from product in products
                          let categoryName = LoadDocument<Category>(product.Category).Name
                          select new
                          {
                              Category = categoryName,
                              Count = 1
                          };

        Reduce = results => from result in results
                            group result by result.Category into g
                            select new
                            {
                                Category = g.Key,
                                Count = g.Sum(x => x.Count)
                            };
    }
}

and issue the query:

IList<Products_ByCategory.Result> results = session
    .Query<Products_ByCategory.Result, Products_ByCategory>()
    .Where(x => x.Category == "Seafood")
    .ToList();
IList<Products_ByCategory.Result> results = session
    .Advanced
    .DocumentQuery<Products_ByCategory.Result, Products_ByCategory>()
    .WhereEquals(x => x.Category, "Seafood")
    .ToList();
from 'Products/ByCategory'
where Category == 'Seafood'

The above query will return one result for _Seafood_, with the appropriate number of products from that category.

Example II - Average

In this example, we will count average product price for each category. The index definition:

public class Products_Average_ByCategory : AbstractIndexCreationTask<Product, Products_Average_ByCategory.Result>
{
    public class Result
    {
        public string Category { get; set; }

        public decimal PriceSum { get; set; }

        public double PriceAverage { get; set; }

        public int ProductCount { get; set; }
    }

    public Products_Average_ByCategory()
    {
        Map = products => from product in products
                          let categoryName = LoadDocument<Category>(product.Category).Name
                          select new
                          {
                              Category = categoryName,
                              PriceSum = product.PricePerUser,
                              PriceAverage = 0,
                              ProductCount = 1
                          };

        Reduce = results => from result in results
                            group result by result.Category into g
                            let productCount = g.Sum(x => x.ProductCount)
                            let priceSum = g.Sum(x => x.PriceSum)
                            select new
                            {
                                Category = g.Key,
                                PriceSum = priceSum,
                                PriceAverage = priceSum / productCount,
                                ProductCount = productCount
                            };
    }
}

and the query:

IList<Products_Average_ByCategory.Result> results = session
    .Query<Products_Average_ByCategory.Result, Products_Average_ByCategory>()
    .Where(x => x.Category == "Seafood")
    .ToList();
IList<Products_Average_ByCategory.Result> results = session
    .Advanced
    .DocumentQuery<Products_Average_ByCategory.Result, Products_Average_ByCategory>()
    .WhereEquals(x => x.Category, "Seafood")
    .ToList();
from 'Products/Average/ByCategory'
where Category == 'Seafood'

Example III - Calculations

This example illustrates how we can put some calculations inside an index using on one of the indexes available in sample database (Product/Sales).

We want to know how many times each product was ordered and how much we earned for it. In order to extract that information, we need to define the following index:

public class Product_Sales : AbstractIndexCreationTask<Order, Product_Sales.Result>
{
    public class Result
    {
        public string Product { get; set; }

        public int Count { get; set; }

        public decimal Total { get; set; }
    }

    public Product_Sales()
    {
        Map = orders => from order in orders
                        from line in order.Lines
                        select new
                        {
                            Product = line.Product,
                            Count = 1,
                            Total = ((line.Quantity * line.PricePerUnit) * (1 - line.Discount))
                        };

        Reduce = results => from result in results
                            group result by result.Product into g
                            select new
                            {
                                Product = g.Key,
                                Count = g.Sum(x => x.Count),
                                Total = g.Sum(x => x.Total)
                            };
    }
}

and send the query:

IList<Product_Sales.Result> results = session
    .Query<Product_Sales.Result, Product_Sales>()
    .ToList();
IList<Product_Sales.Result> results = session
    .Advanced
    .DocumentQuery<Product_Sales.Result, Product_Sales>()
    .ToList();
from 'Product/Sales'