You are currently browsing legacy 4.0 version of documentation. Click here to switch to the newest 4.2 version.

We can help you with migration to the latest RavenDB

Contact Us Now
see on GitHub

Indexes: Map-Reduce Indexes

Map-Reduce indexes allow you to perform complex aggregations of data. The first stage, called the map, runs over documents and extracts portions of data according to the defined mapping function(s). Upon completion of the first phase, reduction is applied to the map results and the final outcome is produced.

The idea behind map-reduce indexing is that aggregation queries using such indexes are very cheap. The aggregation is performed only once and the results are stored inside the index. Once new data comes into the database or existing documents are modified, the map-reduce index will keep the aggregation results up-to-date. The aggregations are never done during querying to avoid expensive calculations that could result in severe performance degradation. When you make the query, RavenDB immediately returns the matching results directly from the index.

For a more in-depth look at how map reduce works, you can read this post: RavenDB 4.0 Unsung Heroes: Map/reduce.

Creating

When it comes to index creation, the only difference between simple indexes and the map-reduce ones is an additional reduce function defined in index definition. To deploy an index we need to create a definition and deploy it using one of the ways described in the creating and deploying article.

Example I - Count

Let's assume that we want to count the number of products for each category. To do it, we can create the following index using LoadDocument inside:

public static class Products_ByCategory extends AbstractIndexCreationTask {
    public static class Result {
        private String category;
        private String count;

        public String getCategory() {
            return category;
        }

        public void setCategory(String category) {
            this.category = category;
        }

        public String getCount() {
            return count;
        }

        public void setCount(String count) {
            this.count = count;
        }
    }

    public Products_ByCategory() {
        map = "docs.Products.Select(product => new { " +
            "    Product = Product, " +
            "    CategoryName = (this.LoadDocument(product.Category, \"Categories\")).Name " +
            "}).Select(this0 => new { " +
            "    Category = this0.CategoryName, " +
            "    Count = 1 " +
            "})";

        reduce = "results.GroupBy(result => result.Category).Select(g => new { " +
            "    Category = g.Key, " +
            "    Count = Enumerable.Sum(g, x => ((int) x.Count)) " +
            "})";
    }
}

and issue the query:

List<Products_ByCategory.Result> results = session
    .query(Products_ByCategory.Result.class, Products_ByCategory.class)
    .whereEquals("Category", "Seafood")
    .toList();
from 'Products/ByCategory'
where Category == 'Seafood'

The above query will return one result for Seafood with the appropriate number of products from that category.

Example II - Average

In this example, we will count an average product price for each category. The index definition:

public static class Products_Average_ByCategory extends AbstractIndexCreationTask {
    public static class Result {
        private String category;
        private double priceSum;
        private double priceAverage;
        private int productCount;

        public String getCategory() {
            return category;
        }

        public void setCategory(String category) {
            this.category = category;
        }

        public double getPriceSum() {
            return priceSum;
        }

        public void setPriceSum(double priceSum) {
            this.priceSum = priceSum;
        }

        public double getPriceAverage() {
            return priceAverage;
        }

        public void setPriceAverage(double priceAverage) {
            this.priceAverage = priceAverage;
        }

        public int getProductCount() {
            return productCount;
        }

        public void setProductCount(int productCount) {
            this.productCount = productCount;
        }
    }

    public Products_Average_ByCategory() {
        map = "docs.Products.Select(product => new { " +
            "    Product = Product, " +
            "    CategoryName = (this.LoadDocument(product.Category, \"Categories\")).Name " +
            "}).Select(this0 => new { " +
            "    Category = this0.CategoryName, " +
            "    PriceSum = this0.Product.PricePerUnit, " +
            "    PriceAverage = 0, " +
            "    ProductCount = 1 " +
            "})";

        reduce = "results.GroupBy(result => result.Category).Select(g => new { " +
            "    g = g, " +
            "    ProductCount = Enumerable.Sum(g, x => ((int) x.ProductCount)) " +
            "}).Select(this0 => new { " +
            "    this0 = this0, " +
            "    PriceSum = Enumerable.Sum(this0.g, x0 => ((decimal) x0.PriceSum)) " +
            "}).Select(this1 => new { " +
            "    Category = this1.this0.g.Key, " +
            "    PriceSum = this1.PriceSum, " +
            "    PriceAverage = this1.PriceSum / ((decimal) this1.this0.ProductCount), " +
            "    ProductCount = this1.this0.ProductCount " +
            "})";
    }
}

and the query:

List<Products_Average_ByCategory.Result> results = session
    .query(Products_Average_ByCategory.Result.class, Products_Average_ByCategory.class)
    .whereEquals("Category", "Seafood")
    .toList();
from 'Products/Average/ByCategory'
where Category == 'Seafood'

Example III - Calculations

This example illustrates how we can put some calculations inside an index using on one of the indexes available in the sample database (Product/Sales).

We want to know how many times each product was ordered and how much we earned for it. In order to extract that information, we need to define the following index:

public static class Product_Sales extends AbstractIndexCreationTask {
    public static class Result {
        private String product;
        private int count;
        private double total;

        public String getProduct() {
            return product;
        }

        public void setProduct(String product) {
            this.product = product;
        }

        public int getCount() {
            return count;
        }

        public void setCount(int count) {
            this.count = count;
        }

        public double getTotal() {
            return total;
        }

        public void setTotal(double total) {
            this.total = total;
        }
    }

    public Product_Sales() {
        map = "docs.Orders.SelectMany(order => order.Lines, (order, line) => new { " +
            "    Product = line.Product, " +
            "    Count = 1, " +
            "    Total = (((decimal) line.Quantity) * line.PricePerUnit) * (1M - line.Discount) " +
            "})";


        reduce = "results.GroupBy(result => result.Product).Select(g => new { " +
            "    Product = g.Key, " +
            "    Count = Enumerable.Sum(g, x => ((int) x.Count)), " +
            "    Total = Enumerable.Sum(g, x0 => ((decimal) x0.Total)) " +
            "})";
    }
}

and send the query:

List<Product_Sales.Result> results = session
    .query(Product_Sales.Result.class, Product_Sales.class)
    .toList();
from 'Product/Sales'

Reduce Results as Artificial Documents

In addition to storing the aggregation results in the index, the map-reduce indexes can also output reduce results as documents to a specified collection. In order to create such documents, called artificial, you need to define the target collection using the OutputReduceToCollection property in the index definition.

public static class Product_Sales_ByMonth extends AbstractIndexCreationTask {
    public static class Result {
        private String product;
        private Date month;
        private int count;
        private double total;

        public String getProduct() {
            return product;
        }

        public void setProduct(String product) {
            this.product = product;
        }

        public Date getMonth() {
            return month;
        }

        public void setMonth(Date month) {
            this.month = month;
        }

        public int getCount() {
            return count;
        }

        public void setCount(int count) {
            this.count = count;
        }

        public double getTotal() {
            return total;
        }

        public void setTotal(double total) {
            this.total = total;
        }
    }

    public Product_Sales_ByMonth() {
        map = "docs.Orders.SelectMany(order => order.Lines, (order, line) => new { " +
            "    Product = line.Product, " +
            "    Month = new DateTime(order.OrderedAt.Year, order.OrderedAt.Month, 1), " +
            "    Count = 1, " +
            "    Total = (((decimal) line.Quantity) * line.PricePerUnit) * (1M - line.Discount) " +
            "})";

        reduce = "results.GroupBy(result => new { " +
            "    Product = result.Product, " +
            "    Month = result.Month " +
            "}).Select(g => new { " +
            "    Product = g.Key.Product, " +
            "    Month = g.Key.Month, " +
            "    Count = Enumerable.Sum(g, x => ((int) x.Count)), " +
            "    Total = Enumerable.Sum(g, x0 => ((decimal) x0.Total)) " +
            "})";

        outputReduceToCollection = "MonthlyProductSales";
    }
}

Writing map-reduce outputs into documents allows you to define additional indexes on top of them that give you the option to create recursive map-reduce operations. This way, you can do daily/monthly/yearly summaries very cheaply and easy.

In addition, you can also apply the usual operations on documents (e.g. data subscriptions or ETL).

Saving documents

Artificial documents are stored immediately after the indexing transaction completes.

Recursive indexing loop

It's forbidden to output reduce results to the collection that:

  • the current index is already working on (e.g. index on DailyInvoices collections outputs to DailyInvoices),
  • the current index is loading a document from it (e.g. index has LoadDocument(id, "Invoices") outputs to Invoices),
  • it is processed by another map-reduce index that outputs results to a collection that the current index is working on (e.g. one index on Invoices collection outputs to DailyInvoices, another index on DailyInvoices outputs to Invoices)

Since that would result in the infinite indexing loop (the index puts an artificial document what triggers the indexing and so on), you will get the detailed error on attempt to create such invalid construction.

Existing collection

Creating a map-reduce index which defines the output collection that already exists and it contains documents will result in an error. You need to delete all documents from the relevant collection before creating the index or output the results to a different one.

Artificial Document IDs

The identifiers of artificial documents are generated as:

  • <OutputCollectionName>/<hash-of-reduce-key>

For the above sample index, the document ID can be:

  • MonthlyProductSales/13770576973199715021

The numeric part is the hash of the reduce key values, in this case: hash(Product, Month).

If the aggregation value for a given reduce key changes then we overwrite the artificial document. It will get removed once there is no result for a given reduce key.

Artificial Document Flags

Documents generated by map-reduce indexes get the following @flags metadata:

"@flags": "Artificial, FromIndex"

Those flags are used internally by the database to filter out artificial documents during replication.