Indexes: Map-Reduce Indexes
-
Map-Reduce indexes allow you to perform complex data aggregation that can be queried on with very little cost, regardless of the data size.
-
To expedite queries and prevent performance degradation during queries, the aggregation is done during the indexing phase, not at query time.
-
Once new data comes into the database, or existing documents are modified,
the Map-Reduce index will re-calculate the aggregated data
so that the aggregation results are always available and up-to-date! -
The aggregation computation is done in two separate consecutive actions: the
Map
and theReduce
.- The Map stage:
This first stage runs the defined Map function(s) on each document, indexing the specified fields. - The Reduce stage:
This second stage groups the specified requested fields that were indexed in the Map stage,
and then runs the Reduce function to get a final aggregation result per field value.
- The Map stage:
For a more in-depth look at how map-reduce works, you can read this post: RavenDB 4.0 Unsung Heroes: Map/reduce.
Creating Map Reduce Indexes
When it comes to index creation, the only difference between simple indexes and the map-reduce ones is an additional reduce function defined in the index definition. To deploy an index we need to create a definition and deploy it using one of the ways described in the creating and deploying article.
Example I - Count
Let's assume that we want to count the number of products for each category. To do it, we can create the following index using LoadDocument
inside:
public class Products_ByCategory : AbstractIndexCreationTask<Product, Products_ByCategory.Result>
{
public class Result
{
public string Category { get; set; }
public int Count { get; set; }
}
public Products_ByCategory()
{
Map = products => from product in products
let categoryName = LoadDocument<Category>(product.Category).Name
select new
{
Category = categoryName,
Count = 1
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
Count = g.Sum(x => x.Count)
};
}
}
public class Products_ByCategory : AbstractJavaScriptIndexCreationTask
{
public class Result
{
public string Category { get; set; }
public int Count { get; set; }
}
public Products_ByCategory()
{
Maps = new HashSet<string>()
{
@"map('products', function(p){
return {
Category: load(p.Category, 'Categories').Name,
Count: 1
}
})"
};
Reduce = @"groupBy(x => x.Category)
.aggregate(g => {
return {
Category: g.key,
Count: g.values.reduce((count, val) => val.Count + count, 0)
};
})";
}
}
and issue the query:
IList<Products_ByCategory.Result> results = session
.Query<Products_ByCategory.Result, Products_ByCategory>()
.Where(x => x.Category == "Seafood")
.ToList();
IList<Products_ByCategory.Result> results = session
.Advanced
.DocumentQuery<Products_ByCategory.Result, Products_ByCategory>()
.WhereEquals(x => x.Category, "Seafood")
.ToList();
from 'Products/ByCategory'
where Category == 'Seafood'
The above query will return one result for Seafood with the appropriate number of products from that category.
Example II - Average
In this example, we will count an average product price for each category. The index definition:
public class Products_Average_ByCategory :
AbstractIndexCreationTask<Product, Products_Average_ByCategory.Result>
{
public class Result
{
public string Category { get; set; }
public decimal PriceSum { get; set; }
public double PriceAverage { get; set; }
public int ProductCount { get; set; }
}
public Products_Average_ByCategory()
{
Map = products => from product in products
let categoryName = LoadDocument<Category>(product.Category).Name
select new
{
Category = categoryName,
PriceSum = product.PricePerUnit,
PriceAverage = 0,
ProductCount = 1
};
Reduce = results => from result in results
group result by result.Category into g
let productCount = g.Sum(x => x.ProductCount)
let priceSum = g.Sum(x => x.PriceSum)
select new
{
Category = g.Key,
PriceSum = priceSum,
PriceAverage = priceSum / productCount,
ProductCount = productCount
};
}
}
public class Products_Average_ByCategory :
AbstractJavaScriptIndexCreationTask
{
public class Result
{
public string Category { get; set; }
public decimal PriceSum { get; set; }
public double PriceAverage { get; set; }
public int ProductCount { get; set; }
}
public Products_Average_ByCategory()
{
Maps = new HashSet<string>()
{
@"map('products', function(product){
return {
Category: load(product.Category, 'Categories').Name,
PriceSum: product.PricePerUnit,
PriceAverage: 0,
ProductCount: 1
}
})"
};
Reduce = @"groupBy(x => x.Category)
.aggregate(g => {
var pricesum = g.values.reduce((sum,x) => x.PriceSum + sum,0);
var productcount = g.values.reduce((sum,x) => x.ProductCount + sum,0);
return {
Category: g.key,
PriceSum: pricesum,
ProductCount: productcount,
PriceAverage: pricesum / productcount
}
})";
}
}
and the query:
IList<Products_Average_ByCategory.Result> results = session
.Query<Products_Average_ByCategory.Result, Products_Average_ByCategory>()
.Where(x => x.Category == "Seafood")
.ToList();
IList<Products_Average_ByCategory.Result> results = session
.Advanced
.DocumentQuery<Products_Average_ByCategory.Result, Products_Average_ByCategory>()
.WhereEquals(x => x.Category, "Seafood")
.ToList();
from 'Products/Average/ByCategory'
where Category == 'Seafood'
Example III - Calculations
This example illustrates how we can put some calculations inside an index using on one of the indexes available in the sample database (Product/Sales
).
We want to know how many times each product was ordered and how much we earned for it. In order to extract that information, we need to define the following index:
public class Product_Sales : AbstractIndexCreationTask<Order, Product_Sales.Result>
{
public class Result
{
public string Product { get; set; }
public int Count { get; set; }
public decimal Total { get; set; }
}
public Product_Sales()
{
Map = orders => from order in orders
from line in order.Lines
select new
{
Product = line.Product,
Count = 1,
Total = ((line.Quantity * line.PricePerUnit) * (1 - line.Discount))
};
Reduce = results => from result in results
group result by result.Product into g
select new
{
Product = g.Key,
Count = g.Sum(x => x.Count),
Total = g.Sum(x => x.Total)
};
}
}
public class Product_Sales : AbstractJavaScriptIndexCreationTask
{
public class Result
{
public string Product { get; set; }
public int Count { get; set; }
public decimal Total { get; set; }
}
public Product_Sales()
{
Maps = new HashSet<string>()
{
@"map('orders', function(order){
var res = [];
order.Lines.forEach(l => {
res.push({
Product: l.Product,
Count: 1,
Total: (l.Quantity * l.PricePerUnit) * (1- l.Discount)
})
});
return res;
})"
};
Reduce = @"groupBy(x => x.Product)
.aggregate(g => {
return {
Product : g.key,
Count: g.values.reduce((sum, x) => x.Count + sum, 0),
Total: g.values.reduce((sum, x) => x.Total + sum, 0)
}
})";
}
}
and send the query:
IList<Product_Sales.Result> results = session
.Query<Product_Sales.Result, Product_Sales>()
.ToList();
IList<Product_Sales.Result> results = session
.Advanced
.DocumentQuery<Product_Sales.Result, Product_Sales>()
.ToList();
from 'Product/Sales'
Creating Multi-Map-Reduce Indexes
A Multi-Map-Reduce index allows aggregating (or 'reducing') data from several collections.
They can be created and edited via Studio, or with API as shown below.
See samples about counting, calculating average, and a more advanced calculation.
In the following code sample, we want the number of companies, suppliers, and employees per city.
We define the map phase on collections 'Employees', 'Companies', and 'Suppliers'.
We then define the reduce phase.
public class Cities_Details :
AbstractMultiMapIndexCreationTask<Cities_Details.IndexEntry>
{
public class IndexEntry
{
public string City;
public int Companies, Employees, Suppliers;
}
public Cities_Details()
{
// Map employees collection.
AddMap<Employee>(employees =>
from e in employees
select new IndexEntry
{
City = e.Address.City,
Companies = 0,
Suppliers = 0,
Employees = 1
}
);
// Map companies collection.
AddMap<Company>(companies =>
from c in companies
select new IndexEntry
{
City = c.Address.City,
Companies = 1,
Suppliers = 0,
Employees = 0
}
);
// Map suppliers collection.
AddMap<Supplier>(suppliers =>
from s in suppliers
select new IndexEntry
{
City = s.Address.City,
Companies = 0,
Suppliers = 1,
Employees = 0
}
);
// Apply reduction/aggregation on multi-map results.
Reduce = results =>
from result in results
group result by result.City
into g
select new IndexEntry
{
City = g.Key,
Companies = g.Sum(x => x.Companies),
Suppliers = g.Sum(x => x.Suppliers),
Employees = g.Sum(x => x.Employees)
};
}
}
A query on the index:
// Queries the index "Cities_Details" - filters "Companies" results and orders by "City".
IList<Cities_Details.IndexEntry> commerceDetails = session
.Query<Cities_Details.IndexEntry, Cities_Details>()
.Where(doc => doc.Companies > 5)
.OrderBy(x => x.City)
.ToList();
You can see this sample described in detail in Inside RavenDB - Multi-Map-Reduce Indexes.
Reduce Results as Artificial Documents
In addition to storing the aggregation results in the index, the map-reduce indexes can also output reduce results as documents to a specified collection.
In order to create such documents, called artificial, you need to define the target collection using the OutputReduceToCollection
property in the index definition.
public Product_Sales_ByDate()
{
Map = orders => from order in orders
from line in order.Lines
select new
{
Product = line.Product,
Date = new DateTime(order.OrderedAt.Year,
order.OrderedAt.Month,
order.OrderedAt.Day),
Count = 1,
Total = ((line.Quantity * line.PricePerUnit) * (1 - line.Discount))
};
Reduce = results => from result in results
group result by new { result.Product, result.Date } into g
select new
{
Product = g.Key.Product,
Date = g.Key.Date,
Count = g.Sum(x => x.Count),
Total = g.Sum(x => x.Total)
};
OutputReduceToCollection = "DailyProductSales";
PatternReferencesCollectionName = "DailyProductSales/References";
PatternForOutputReduceToCollectionReferences = x => $"sales/daily/{x.Date:yyyy-MM-dd}";
}
public class Product_Sales_ByDate : AbstractIndexCreationTask
{
public override IndexDefinition CreateIndexDefinition()
{
return new IndexDefinition
{
Maps =
{
@"from order in docs.Orders
from line in order.Lines
select new {
line.Product,
Date = order.OrderedAt,
Profit = line.Quantity * line.PricePerUnit * (1 - line.Discount)
};"
},
Reduce =
@"from r in results
group r by new { r.OrderedAt, r.Product }
into g
select new {
Product = g.Key.Product,
Date = g.Key.Date,
Profit = g.Sum(r => r.Profit)
};",
OutputReduceToCollection = "DailyProductSales",
PatternReferencesCollectionName = "DailyProductSales/References",
PatternForOutputReduceToCollectionReferences = "sales/daily/{Date:yyyy-MM-dd}"
};
}
}
Writing map-reduce outputs into documents allows you to define additional indexes on top of them that give you the option to create recursive map-reduce operations. This way, you can do daily/monthly/yearly summaries very cheaply and easy.
In addition, you can also apply the usual operations on documents (e.g. data subscriptions or ETL).
Saving documents
Artificial documents are stored immediately after the indexing transaction completes.
Recursive indexing loop
It's forbidden to output reduce results to the collection that:
- the current index is already working on (e.g. index on
DailyInvoices
collections outputs toDailyInvoices
), - the current index is loading a document from it (e.g. index has
LoadDocument(id, "Invoices")
outputs toInvoices
), - it is processed by another map-reduce index that outputs results to a collection that the current index is working on (e.g. one index on
Invoices
collection outputs toDailyInvoices
, another index onDailyInvoices
outputs toInvoices
)
Since that would result in the infinite indexing loop (the index puts an artificial document what triggers the indexing and so on), you will get the detailed error on attempt to create such invalid construction.
Existing collection
Creating a map-reduce index which defines the output collection that already exists and it contains documents will result in an error. You need to delete all documents from the relevant collection before creating the index or output the results to a different one.
Artificial Document IDs
The identifiers of artificial documents are generated as:
<OutputCollectionName>/<hash-of-reduce-key>
For the above sample index, the document ID can be:
MonthlyProductSales/13770576973199715021
The numeric part is the hash of the reduce key values, in this case: hash(Product, Month)
.
If the aggregation value for a given reduce key changes then we overwrite the artificial document. It will get removed once there is no result for a given reduce key.
Artificial Document Flags
Documents generated by map-reduce indexes get the following @flags
metadata:
"@flags": "Artificial, FromIndex"
Those flags are used internally by the database to filter out artificial documents during replication.