Map-Reduce indexes allow you to perform complex aggregations of data. The first stage, called the map, runs over documents and extracts portions of data according to the defined mapping function(s).
Upon completion of the first phase, reduction is applied to the map results and the final outcome is produced.
The idea behind map-reduce indexing is that aggregation queries using such indexes are very cheap. The aggregation is performed only once and the results are stored inside the index.
Once new data comes into the database or existing documents are modified, the map-reduce index will keep the aggregation results up-to-date. The aggregations are never done during
querying to avoid expensive calculations that could result in severe performance degradation. When you make the query, RavenDB immediately returns the matching results directly from the index.
When it comes to index creation, the only difference between simple indexes and the map-reduce ones is an additional reduce function defined in index definition.
To deploy an index we need to create a definition and deploy it using one of the ways described in the creating and deploying article.
Example I - Count
Let's assume that we want to count the number of products for each category. To do it, we can create the following index using LoadDocument inside:
In addition to storing the aggregation results in the index, the map-reduce indexes can also output reduce results as documents to a specified collection.
In order to create such documents, called artificial, you need to define the target collection using the OutputReduceToCollection property in the index definition.
Writing map-reduce outputs into documents allows you to define additional indexes on top of them that give you the option to create recursive map-reduce operations.
This way, you can do daily/monthly/yearly summaries very cheaply and easy.
In addition, you can also apply the usual operations on documents (e.g. data subscriptions or ETL).
Recursive indexing loop
It is forbidden to output reduce results to a collection when:
It is a collection that the current index is already working on
(e.g. index on DailyInvoices collections outputs to DailyInvoices)
It is a collection that the current index is loading a document from
(e.g. index has LoadDocument(id, "Invoices") outputs to Invoices)
it is a collection that is processed by another map-reduce index, that
outputs results to a collection that the current index is working on
(e.g. one index indexes the Invoices collection and outputs to the
DailyInvoices collection, and a second index indexes the DailyInvoices
collection and outputs to the Invoices collection)
The reason these scenarios are forbidden is that they result in infinite
indexing loop. Attempting to create such indexes will produce a detailed error.
Output to an Existing collection
Creating a map-reduce index which defines an output collection that already
exists and contains documents, will result in an error. Please delete all documents
from the target collection before creating the index or output the results to
a different collection.
Modification of Artificial Documents
Artificial documents can be loaded and queried just like regular documents.
However, it is not recommended to edit artificial documents manually since
any index results update would overwrite all manual modifications made in them.
Artificial Document IDs
The identifiers of artificial documents are generated as:
For the above sample index, the document ID can be:
The numeric part is the hash of the reduce key values, in this case: hash(Product, Month).
If the aggregation value for a given reduce key changes then we overwrite the artificial document. It will get removed once there is no result for a given reduce key.
Artificial Document Flags
Documents generated by map-reduce indexes get the following @flags metadata:
"@flags": "Artificial, FromIndex"
Those flags are used internally by the database to filter out artificial documents during replication.