Create Map-Reduce Index



The Map Stage

Define a Map Function:

  • In the following example, we want to get the following aggregated values:

    • The number of orders each company makes &
    • The accumulative amount spent on all orders by each company.
  • Lets define the following Map function on the Orders collection:

Figure 1. The Map Function

The Map Function

  1. Index Name - An index name can be composed of letters, digits, ., /, -, and _. The name must be unique in the scope of the database.

    • Uniqueness is evaluated in a case-insensitive way - you can't create indexes named both usersbyname and UsersByName.
    • The characters _ and / are treated as equivalent - you can't create indexes named both users/byname and users_byname.
    • If the index name contains the character ., it must have some other character on both sides to be valid. /./ is a valid index name, but ./, /., and /../ are all invalid.
  2. The Map function in defines the following 3 fields that will be indexed:

    • order.Company -
      The company

    • OrdersCount -
      In the Map stage, per single Order document, the value of this field is '1',
      as each order document in the Order collection was made by a single, specific company.
      This field will be aggregated later in the Reduce stage, accumulating the data from all the Orders documents, per company.
      The accumulative value of this field will represent the number of all orders a company has made.

    • TotalOrdersAmount -
      In the Map stage, per single Order document, the value of this field is the total order amount for that document.
      (Summing up all products in the 'Lines' field in the document, and taking the discount into account).
      This field will be aggregated later in the Reduce stage, accumulating the data from all the Orders documents, per company.
      The accumulative value of this field will represent the total amount spent by a company on all orders.

  3. Next, click 'Add Reduction' to continue and add the 'Reduce' function. See The Reduce Stage.

The Reduce Stage

Define a Reduce Function:

Figure 2. The Reduce Function

The Reduce Function

    • In the Reduce function above, results are grouped by the Company field,
      so that we can get the data per company. (group result by result.Company)

    • The index results will show in the following format:

      • Company - will be the company for which we see the results.

      • OrdersCount - is the aggregation of the orders count value from the Map stage
        (How many orders were made by each company).

      • TotalOrdesAmount - is the aggregation of the total orders amount made by each company
        (How much money the company has spent all together, on all orders).

  1. Optional: The results of the Map-Reduce index can be saved in a new collection.
    Learn more in Saving Map-Reduce Results in a Collection (Artificial Documents)

Important Guidelines

  • Both the Map and the Reduce functions must be pure functions, they should have no external input.
    i.e. usage of Random, DateTime.Now or any similar calls is not allowed.
    Calling them with the same input must always return the same output.

  • The Reduce output must match the Map output, they must have the same structure.
    RavenDB will error if you have a different shape for each of the functions.

Map-Reduce Query Results

Figure 3. Map Reduce Query Result

Map-Reduce Query Result

  • In the query results, the number of orders per company is represented in the OrdersCount column.
    The total amount of all orders per company is represented in the TotalOrdersAmount column.
    The column names correspond to the Map-Reduce fields definition.

  • The Map-Reduce results can also be visualized in Map-Reduce Visualizer.

Multi-Map-Reduce

  • Multi-Map-Reduce indexes allow us to aggregate data from multiple collections.

  • In the below example we define three maps, on the Companies, Suppliers and Employees collections.
    In each map, we output a count for the type of the document we're mapping, as well as the relevant City.

Figure 4. Define Multi-Maps

Define Multi Maps

  • In the Reduce part we group by City and then sum up all the results from all the intermediate steps,
    to get the final city count in each collection.
Figure 4.1 The Multi-Map-Reduce

The Multi-Map-Reduce

Saving Map-Reduce Results in a Collection (Artificial Documents)

  • The results of the Map-Reduce index can be saved as output documents in a new output collection.

  • These output documents can be further aggregated by reference documents, documents that contain the document IDs of output documents.

  • These documents created by Map-Reduce Indexes are called Artificial Documents.

  • Learn more about using Artificial Documents from the client code in Map-Reduce Indexes: Reduce Results as Artificial Documents.

Figure 5. Save Map-Reduce Results into a Collection

Save Map-Reduce Results into a Collection

  1. Specify the name of the collection you want the output documents to be saved in.
    Note: the collection specified must be empty.

  2. Specify a pattern for the reference document IDs. By including reduce function fields, this pattern determines which output documents will be included in each reference document.

  3. The name of the collection for the reference documents. By default, this is <output collection name>/Reference.

Figure 6. An Artificial Document

An Artificial Document in the collection CompaniesOrders

Figure 7. A Reference Document

A Reference Document in the collection CompaniesOrders/Reference

Artificial Documents -vs- Regular Documents

  • Artificial documents are created directly by the index.

  • They behave just like standard documents, except that they are not replicated to other nodes in the database group.

  • Artificial documents are updated whenever the index completes indexing a batch of documents.

While artificial documents can be loaded and queried just like regular documents, it is not recommended to edit them manually since any index results update would overwrite all manual modifications made in them.

Artificial Documents Usage

  • You can set up indexes on top of the Artificial Documents collection, including additional MapReduce indexes,
    giving you the option to create recursive map-reduce operations.

  • You can set up a RavenDB ETL Task on the Artificial Documents collection to a dedicated database on a separate cluster for further processing, as well as other ongoing tasks such as: SQL ETL and Subscriptions.

Limitations

  • RavenDB will detect and generate an error if you have a cycle of artificial documents. You can't define another Map-Reduce index that will output artificial documents if that will trigger (directly or indirectly) the same index.
    Otherwise, you might set up a situation where the indexes run in an infinite loop.

  • An empty collection must be used as the target collection for the artificial documents.
    This is mandatory since the Map-Reduce index overwrites any existing document in the collection.

  • You have no control over the artificial documents IDs.
    These identifiers are generated by RavenDB based on the hash of the reduce key.

  • Artificial documents are not sent over replication,
    each node in the database group has its own (independent) copy of the index results.
    Therefore:

    1. It is recommended to use artificial documents with Subscriptions only on a single node.
      A Subscription failover to another node may cause the subscription to send Artificial Documents
      that the subscription has already acknowledged.

    2. Artificial documents cannot use Revisions or Attachments.