see on GitHub

Indexes: Map-Reduce Indexes

Map-Reduce indexes allow you to perform complex aggregations of data. The first stage, called the map, runs over documents and extracts portions of data according to the defined mapping function(s). Upon completion of the first phase, reduction is applied to the map results and the final outcome is produced.

The idea behind map-reduce indexing is that aggregation queries using such indexes are very cheap. The aggregation is performed only once and the results are stored inside the index. Once new data comes into the database or existing documents are modified, the map-reduce index will keep the aggregation results up-to-date. The aggregations are never done during querying to avoid expensive calculations that could result in severe performance degradation. When you make the query, RavenDB immediately returns the matching results directly from the index.

For a more in-depth look at how map reduce works, you can read this post: RavenDB 4.0 Unsung Heroes: Map/reduce.

Creating

When it comes to index creation, the only difference between simple indexes and the map-reduce ones is an additional reduce function defined in index definition. To deploy an index we need to create a definition and deploy it using one of the ways described in the creating and deploying article.

Example I - Count

Let's assume that we want to count the number of products for each category. To do it, we can create the following index using LoadDocument inside:

public class Products_ByCategory : AbstractIndexCreationTask<Product, Products_ByCategory.Result>
{
    public class Result
    {
        public string Category { get; set; }

        public int Count { get; set; }
    }

    public Products_ByCategory()
    {
        Map = products => from product in products
                          let categoryName = LoadDocument<Category>(product.Category).Name
                          select new
                          {
                              Category = categoryName,
                              Count = 1
                          };

        Reduce = results => from result in results
                            group result by result.Category into g
                            select new
                            {
                                Category = g.Key,
                                Count = g.Sum(x => x.Count)
                            };
    }
}
public class Products_ByCategory : AbstractJavaScriptIndexCreationTask
{
    public class Result
    {
        public string Category { get; set; }

        public int Count { get; set; }
    }

    public Products_ByCategory()
    {
        Maps = new HashSet<string>()
        {
            @"map('products', function(p){
                return {
                    Category: load(p.Category, 'Categories').Name,
                    Count: 1
                }
            })"
        };

        Reduce = @"groupBy(x => x.Category)
                    .aggregate(g => {
                        return {
                            Category: g.key,
                            Count: g.values.reduce((count, val) => val.Count + count, 0)
                        };
                    })";
    }
}

and issue the query:

IList<Products_ByCategory.Result> results = session
    .Query<Products_ByCategory.Result, Products_ByCategory>()
    .Where(x => x.Category == "Seafood")
    .ToList();
IList<Products_ByCategory.Result> results = session
    .Advanced
    .DocumentQuery<Products_ByCategory.Result, Products_ByCategory>()
    .WhereEquals(x => x.Category, "Seafood")
    .ToList();
from 'Products/ByCategory'
where Category == 'Seafood'

The above query will return one result for Seafood with the appropriate number of products from that category.

Example II - Average

In this example, we will count an average product price for each category. The index definition:

public class Products_Average_ByCategory :
    AbstractIndexCreationTask<Product, Products_Average_ByCategory.Result>
{
    public class Result
    {
        public string Category { get; set; }

        public decimal PriceSum { get; set; }

        public double PriceAverage { get; set; }

        public int ProductCount { get; set; }
    }

    public Products_Average_ByCategory()
    {
        Map = products => from product in products
                          let categoryName = LoadDocument<Category>(product.Category).Name
                          select new
                          {
                              Category = categoryName,
                              PriceSum = product.PricePerUnit,
                              PriceAverage = 0,
                              ProductCount = 1
                          };

        Reduce = results => from result in results
                            group result by result.Category into g
                            let productCount = g.Sum(x => x.ProductCount)
                            let priceSum = g.Sum(x => x.PriceSum)
                            select new
                            {
                                Category = g.Key,
                                PriceSum = priceSum,
                                PriceAverage = priceSum / productCount,
                                ProductCount = productCount
                            };
    }
}
public class Products_Average_ByCategory :
                        AbstractJavaScriptIndexCreationTask
{
    public class Result
    {
        public string Category { get; set; }

        public decimal PriceSum { get; set; }

        public double PriceAverage { get; set; }

        public int ProductCount { get; set; }
    }

    public Products_Average_ByCategory()
    {
        Maps = new HashSet<string>()
        {
            @"map('products', function(product){
                return {
                    Category: load(product.Category, 'Categories').Name,
                    PriceSum: product.PricePerUnit,
                    PriceAverage: 0,
                    ProductCount: 1
                }
            })"
        };

        Reduce = @"groupBy(x => x.Category)
                    .aggregate(g => {
                        var pricesum = g.values.reduce((sum,x) => x.PriceSum + sum,0);
                        var productcount = g.values.reduce((sum,x) => x.ProductCount + sum,0);
                        return {
                            Category: g.key,
                            PriceSum: pricesum,
                            ProductCount: productcount,
                            PriceAverage: pricesum / productcount
                        }
                    })";
    }
}

and the query:

IList<Products_Average_ByCategory.Result> results = session
    .Query<Products_Average_ByCategory.Result, Products_Average_ByCategory>()
    .Where(x => x.Category == "Seafood")
    .ToList();
IList<Products_Average_ByCategory.Result> results = session
    .Advanced
    .DocumentQuery<Products_Average_ByCategory.Result, Products_Average_ByCategory>()
    .WhereEquals(x => x.Category, "Seafood")
    .ToList();
from 'Products/Average/ByCategory'
where Category == 'Seafood'

Example III - Calculations

This example illustrates how we can put some calculations inside an index using one of the indexes available in the sample database (Product/Sales).

We want to know how many times each product was ordered and how much we earned for it. In order to extract that information, we need to define the following index:

public class Product_Sales : AbstractIndexCreationTask<Order, Product_Sales.Result>
{
    public class Result
    {
        public string Product { get; set; }

        public int Count { get; set; }

        public decimal Total { get; set; }
    }

    public Product_Sales()
    {
        Map = orders => from order in orders
                        from line in order.Lines
                        select new
                        {
                            Product = line.Product,
                            Count = 1,
                            Total = ((line.Quantity * line.PricePerUnit) * (1 - line.Discount))
                        };

        Reduce = results => from result in results
                            group result by result.Product into g
                            select new
                            {
                                Product = g.Key,
                                Count = g.Sum(x => x.Count),
                                Total = g.Sum(x => x.Total)
                            };
    }
}
public class Product_Sales : AbstractJavaScriptIndexCreationTask
{
    public class Result
    {
        public string Product { get; set; }

        public int Count { get; set; }

        public decimal Total { get; set; }
    }

    public Product_Sales()
    {
        Maps = new HashSet<string>()
        {
            @"map('orders', function(order){
                    var res = [];
                    order.Lines.forEach(l => {
                        res.push({
                            Product: l.Product,
                            Count: 1,
                            Total:  (l.Quantity * l.PricePerUnit) * (1- l.Discount)
                        })
                    });
                    return res;
                })"
        };

        Reduce = @"groupBy(x => x.Product)
            .aggregate(g => {
                return {
                    Product : g.key,
                    Count: g.values.reduce((sum, x) => x.Count + sum, 0),
                    Total: g.values.reduce((sum, x) => x.Total + sum, 0)
                }
            })";
    }
}

and send the query:

IList<Product_Sales.Result> results = session
    .Query<Product_Sales.Result, Product_Sales>()
    .ToList();
IList<Product_Sales.Result> results = session
    .Advanced
    .DocumentQuery<Product_Sales.Result, Product_Sales>()
    .ToList();
from 'Product/Sales'

Reduce Results as Artificial Documents

Map-Reduce Output Documents

In addition to storing the aggregation results in the index, the map-reduce index can also output those reduce results as documents to a specified collection. In order to create these documents, called "artificial", you need to define the target collection using the OutputReduceToCollection property in the index definition.

Writing map-reduce outputs into documents allows you to define additional indexes on top of them that give you the option to create recursive map-reduce operations. This makes it cheap and easy to, for example, recursively create daily, monthly, and yearly summaries on the same data.

In addition, you can also apply the usual operations on artificial documents (e.g. data subscriptions or ETL).

If the aggregation value for a given reduce key changes, we overwrite the output document. If the given reduce key no longer has a result, the output document will be removed.

Reference Documents

To help organize these output documents, the map-reduce index can also create an additional collection of artificial reference documents. These documents aggregate the output documents and store their document IDs in an array field ReduceOutputs.

The document IDs of reference documents are customized to follow some pattern. The format you give to their document ID also determines how the output documents are grouped.

Because reference documents have well known, predictable IDs, they are easier to plug into indexes and other operations, and can serve as an intermediary for the output documents whose IDs are less predictable. This allows you to chain map-reduce indexes in a recursive fashion, see Example II.

Learn more about how to configure output and reference documents in the Studio: Create Map-Reduce Index.

Artificial Document Properties

IDs

The identifiers of map reduce output documents have three components in this format:

<Output collection name>/<incrementing value>/<hash of reduce key values>

The index in Example I might generate an output document ID like this:

DailyProductSales/35/14369232530304891504

  • "DailyProductSales" is the collection name specified for the output documents.
  • The middle part is an incrementing integer assigned by the server. This number grows by some amount whenever the index definition is modified. This can be useful because when an index definition changes, there is a brief transition phase when the new output documents are being created, but the old output documents haven't been deleted yet (this phase is called "side-by-side indexing"). During this phase, the output collection contains output documents created both by the old version and the new version of the index, and they can be distinguished by this value: the new output documents will always have a higher value (by 1 or more).
  • The last part of the document ID (the unique part) is the hash of the reduce key values - in this case: hash(Product, Month).

The identifiers of reference documents follow some pattern you choose, and this pattern determines which output documents are held by a given reference document.

The index in Example I has this pattern for reference documents:

sales/daily/{Date:yyyy-MM-dd}

And this produces reference document IDs like this:

sales/daily/1998-05-06

The pattern is built using the same syntax as the StringBuilder.AppendFormat method. See here to learn about the date formatting in particular.

Metadata

Artificial documents generated by map-reduce indexes get the following @flags in their metadata:

"@flags": "Artificial, FromIndex"

These flags are used internally by the database to filter out artificial documents during replication.

Syntax

The map-reduce output documents are configured with these properties of IndexDefinition:

string OutputReduceToCollection;

string PatternReferencesCollectionName;

// Using IndexDefinition
string PatternForOutputReduceToCollectionReferences;

// Inheriting from AbstractGenericIndexCreationTask<TReduceResult>
Expression<Func<TReduceResult, string>> PatternForOutputReduceToCollectionReferences;
Parameters Type Description
OutputReduceToCollection string Collection name for the output documents.
PatternReferencesCollectionName string Optional collection name for the reference documents - by default it is <OutputReduceToCollection>/References.
PatternForOutputReduceToCollectionReferences string / Expression<Func<TReduceResult, string>> Document ID format for reference documents. This ID references the fields of the reduce function output, which determines how the output documents are aggregated. The type of this parameter is different depending on if the index is created using IndexDefinition or AbstractIndexCreationTask.

To index artificial documents in strongly typed syntax (LINQ), you will need the type of reference documents:

public class OutputReduceToCollectionReference
{
    public string Id { get; set; }
    public List<string> ReduceOutputs { get; set; }
}
Parameters Type Description
Id string The reference document's ID
ReduceOutputs List<string> List of map reduce output documents that this reference document aggregates. Determined by the pattern of the reference document ID.

Examples

Example I

Here is a map-reduce index with output documents and reference documents:

public Product_Sales_ByDate()
{
    Map = orders => from order in orders
                    from line in order.Lines
                    select new
                    {
                        Product = line.Product,
                        Date = new DateTime(order.OrderedAt.Year,
                                            order.OrderedAt.Month,
                                            order.OrderedAt.Day),
                        Count = 1,
                        Total = ((line.Quantity * line.PricePerUnit) * (1 - line.Discount))
                    };

    Reduce = results => from result in results
                        group result by new { result.Product, result.Date } into g
                        select new
                        {
                            Product = g.Key.Product,
                            Date = g.Key.Date,
                            Count = g.Sum(x => x.Count),
                            Total = g.Sum(x => x.Total)
                        };

    OutputReduceToCollection = "DailyProductSales";
    PatternReferencesCollectionName = "DailyProductSales/References";
    PatternForOutputReduceToCollectionReferences = x => $"sales/daily/{x.Date:yyyy-MM-dd}";
}
public class Product_Sales_ByDate : AbstractIndexCreationTask
{
    public override IndexDefinition CreateIndexDefinition()
    {
        return new IndexDefinition
        {
            Maps =
            {
                @"from order in docs.Orders
                  from line in order.Lines
                  select new {
                      line.Product, 
                      Date = order.OrderedAt,
                      Profit = line.Quantity * line.PricePerUnit * (1 - line.Discount)
                  };"
            },
            Reduce = 
                @"from r in results
                  group r by new { r.OrderedAt, r.Product }
                  into g
                  select new { 
                      Product = g.Key.Product,
                      Date = g.Key.Date,
                      Profit = g.Sum(r => r.Profit)
                  };",

            OutputReduceToCollection = "DailyProductSales",
            PatternReferencesCollectionName = "DailyProductSales/References",
            PatternForOutputReduceToCollectionReferences = "sales/daily/{Date:yyyy-MM-dd}"
        };
    }
}

In the LINQ index example above (which inherits AbstractIndexCreationTask), the reference document ID pattern is set with a lambda expression:

PatternForOutputReduceToCollectionReferences = x => $"sales/daily/{x.Date:yyyy-MM-dd}";

This gives the reference documents IDs in this general format: sales/monthly/1998-05-01. The reference document with that ID contains the IDs of all the output documents from the month of May 1998.

In the JavaScript index example (which uses IndexDefinition), the reference document ID pattern is set with a string:

PatternForOutputReduceToCollectionReferences = "sales/daily/{Date:yyyy-MM-dd}"

This gives the reference documents IDs in this general format: sales/daily/1998-05-06. The reference document with that ID contains the IDs of all the output documents from May 6th 1998.

Example II

This is an example of a "recursive" map reduce index - it indexes the output documents of the index above, using the reference documents.

public MapReduce_Output_OrderProduct_ByCount()
{
    Map = orders => from order in orders
                    let referenceDocuments = LoadDocument<OutputReduceToCollectionReference>(
                                             $"sales/daily/{order.OrderedAt}", 
                                             "DailyProductSales/References")
                    from refDoc in referenceDocuments.ReduceOutputs
                    let outputDoc = LoadDocument<OutputDocument>(refDoc)
                    select new Result {
                        Product = outputDoc.Product,
                        Count = outputDoc.Count,
                        NumOrders = 1
                    };

    Reduce = results => from r in results
                        group r by new { r.Count, r.Product }
                        into g
                        select new { 
                            Product = g.Key.Product,
                            Count = g.Key.Count,
                            NumOrders = g.Sum(x => x.NumOrders)
                        };
}

Remarks

Saving documents

Artificial documents are stored immediately after the indexing transaction completes.

Recursive indexing loop

It's forbidden to output reduce results to the collection that:

  • the current index is already working on (e.g. index on DailyInvoices collections outputs to DailyInvoices),
  • the current index is loading a document from it (e.g. index has LoadDocument(id, "Invoices") outputs to Invoices),
  • it is processed by another map-reduce index that outputs results to a collection that the current index is working on (e.g. one index on Invoices collection outputs to DailyInvoices, another index on DailyInvoices outputs to Invoices)

Since that would result in the infinite indexing loop (the index puts an artificial document that triggers the indexing and so on), you will get the detailed error on attempt to create such invalid construction.

Existing collection

Creating a map-reduce index which defines the output collection that already exists and it contains documents will result in an error. You need to delete all documents from the relevant collection before creating the index or output the results to a different one.