Create Map-Reduce Index
-
Map-Reduce indexes allow you to perform complex data aggregation that can be queried on with very little cost, regardless of the data size.
-
The aggregation is done during the indexing phase, not at query time.
-
Once new data comes into the database, or existing documents are modified,
the Map-Reduce index will re-calculate the aggregated data,
so that the aggregation results are always available and up-to-date ! -
The aggregation computation is done in two separate consecutive actions: the
Map
and theReduce
. -
The Map stage:
This first stage runs the defined Map function(s) on each document, indexing the specified fields. -
The Reduce stage:
This second stage groups the specified requested fields that were indexed in the Map stage,
and then runs the Reduce function to get a final aggregation result per field value. -
The Map-Reduce results can be visualized in the Map-Reduce Visualizer.
-
In this page:
The Map Stage
Define a Map Function:
-
In the following example, we want to get the following aggregated values:
- The number of orders each company makes &
- The accumulative amount spent on all orders by each company.
-
Lets define the following Map function on the
Orders
collection:
The Map Function
-
Index Name - An index name can be composed of letters, digits,
.
,/
,-
, and_
. The name must be unique in the scope of the database.- Uniqueness is evaluated in a case-insensitive way - you can't create indexes named both
usersbyname
andUsersByName
. - The characters
_
and/
are treated as equivalent - you can't create indexes named bothusers/byname
andusers_byname
. - If the index name contains the character
.
, it must have some other character on both sides to be valid././
is a valid index name, but./
,/.
, and/../
are all invalid.
- Uniqueness is evaluated in a case-insensitive way - you can't create indexes named both
-
The Map function in defines the following 3 fields that will be indexed:
-
order.Company -
The company -
OrdersCount -
In the Map stage, per single Order document, the value of this field is '1',
as each order document in the Order collection was made by a single, specific company.
This field will be aggregated later in the Reduce stage, accumulating the data from all the Orders documents, per company.
The accumulative value of this field will represent the number of all orders a company has made. -
TotalOrdersAmount -
In the Map stage, per single Order document, the value of this field is the total order amount for that document.
(Summing up all products in the 'Lines' field in the document, and taking the discount into account).
This field will be aggregated later in the Reduce stage, accumulating the data from all the Orders documents, per company.
The accumulative value of this field will represent the total amount spent by a company on all orders.
-
-
Next, click 'Add Reduction' to continue and add the 'Reduce' function. See The Reduce Stage.
The Reduce Stage
Define a Reduce Function:
The Reduce Function
-
-
In the Reduce function above, results are grouped by the
Company
field,
so that we can get the data per company.(group result by result.Company)
-
The index results will show in the following format:
-
Company - will be the company for which we see the results.
-
OrdersCount - is the aggregation of the orders count value from the Map stage
(How many orders were made by each company). -
TotalOrdesAmount - is the aggregation of the total orders amount made by each company
(How much money the company has spent all together, on all orders).
-
-
-
Optional: The results of the Map-Reduce index can be saved in a new collection.
Learn more in Saving Map-Reduce Results in a Collection (Artificial Documents)
Important Guidelines
-
Both the Map and the Reduce functions must be pure functions, they should have no external input.
i.e. usage of Random, DateTime.Now or any similar calls is not allowed.
Calling them with the same input must always return the same output. -
The Reduce output must match the Map output, they must have the same structure.
RavenDB will error if you have a different shape for each of the functions.
Map-Reduce Query Results
Map-Reduce Query Result
-
In the query results, the number of orders per company is represented in the
OrdersCount
column.
The total amount of all orders per company is represented in theTotalOrdersAmount
column.
The column names correspond to the Map-Reduce fields definition. -
The Map-Reduce results can also be visualized in Map-Reduce Visualizer.
Multi-Map-Reduce
-
Multi-Map-Reduce indexes allow us to aggregate data from multiple collections.
-
In the below example we define three maps, on the
Companies
,Suppliers
andEmployees
collections.
In each map, we output a count for the type of the document we're mapping, as well as the relevant City.
Define Multi Maps
- In the Reduce part we group by
City
and then sum up all the results from all the intermediate steps,
to get the final city count in each collection.
The Multi-Map-Reduce
Saving Map-Reduce Results in a Collection (Artificial Documents)
-
The results of the Map-Reduce index can be saved as output documents in a new output collection.
-
These output documents can be further aggregated by reference documents, documents that contain the document IDs of output documents.
-
These documents created by Map-Reduce Indexes are called Artificial Documents.
-
Learn more about using Artificial Documents from the client code in Map-Reduce Indexes: Reduce Results as Artificial Documents.
Save Map-Reduce Results into a Collection
-
Specify the name of the collection you want the output documents to be saved in.
Note: the collection specified must be empty (contain no documents). -
Specify a pattern for the reference document IDs. By including reduce function fields, this pattern determines which output documents will be included in each reference document.
-
The name of the collection for the reference documents. By default, this is
<output collection name>/Reference
.
An Artificial Document in the collection CompaniesOrders
A Reference Document in the collection CompaniesOrders/Reference
Artificial Documents -vs- Regular Documents
-
Artificial documents are created directly by the index.
-
They behave just like standard documents, except that they are not replicated to other nodes in the database group.
-
Artificial documents are updated whenever the index completes indexing a batch of documents.
While artificial documents can be loaded and queried just like regular documents, it is not recommended
to edit them manually since any index results update would overwrite all manual modifications made in them.
Artificial Documents Usage
-
You can set up indexes on top of the Artificial Documents collection, including additional MapReduce indexes,
giving you the option to create recursive map-reduce operations. -
You can set up a RavenDB ETL Task on the Artificial Documents collection to a dedicated database on a separate cluster for further processing, as well as other ongoing tasks such as: SQL ETL and Subscriptions.
Limitations
-
RavenDB will detect and generate an error if you have a cycle of artificial documents. You can't define another Map-Reduce index that will output artificial documents if that will trigger (directly or indirectly) the same index.
Otherwise, you might set up a situation where the indexes run in an infinite loop. -
An empty collection must be used as the target collection for the artificial documents.
This is mandatory since the Map-Reduce index overwrites any existing document in the collection. -
You have no control over the artificial documents IDs.
These identifiers are generated by RavenDB based on the hash of the reduce key. -
Artificial documents are not sent over replication,
each node in the database group has its own (independent) copy of the index results.
Therefore:-
It is recommended to use artificial documents with Subscriptions only on a single node.
A Subscription failover to another node may cause the subscription to send Artificial Documents
that the subscription has already acknowledged. -
Artificial documents cannot use Revisions or Attachments.
-