Indexes Overview



Indexes - The General Concept

  • Once defined, the index iterates over the documents,
    and for every document-field that is requested to be indexed,
    a map is built between the terms derived from these fields and the actual documents that contain them.

  • A query operating on these fields ends up with a simple search
    from the queried terms to the list of documents that contain them.

  • After the first indexing run, the index will keep this map current without re-processing the entire dataset -
    the index will only update the relevant indexed data when a document update happens in the database.

  • Indexes in RavenDB are split across multiple axes (see more below)

    • Auto Indexes -vs- Static Indexes
    • Map Indexes -vs- Map-Reduce Indexes
    • Single-Collection Indexes -vs- Multi-Collection Indexes

Indexes - The Moving Parts

1. Index Definition


  • The index definition tells RavenDB how to index the data.

  • It specifies which fields to index and defines how they should be indexed,
    e.g., configuring a field for full-text search, selecting the analyzer to use, etc.
    These fields can be specified explicitly or defined dynamically supporting any document structure.

  • The index definition is created by the client (Static-Index), or by the server (Auto-Index).

  • Note: Data from related documents can also be indexed using 'LoadDocument'.
    Learn more in Indexing Related Documents.

2. Indexing Process


  • Indexing is the process of iterating over the raw documents, indexing their data as defined by the index definition, and building a map between the indexed terms and the raw documents that contain them.

  • Indexing is a background operation, it is scheduled to occur in an async manner upon any document change. Once defined and deployed, an index will initially process the entire dataset.
    After that, the index will only process documents that were modified, added or deleted.

  • A document write operation doesn't wait for the index to complete processing -
    the write operation is completed as soon as the transaction is written to disk.
    However, a write operation can wait for the indexing process to finish before acknowledging the write by using method WaitForIndexesAfterSaveChanges.

  • An index is considered Stale if it has not yet processed all the data.
    A query can request that results are returned only when the index is up-to-date by using method WaitForNonStaleResults. Learn more in: Understanding Eventual Consistency.

  • The async indexing process works with hard resets, shutdowns, and the like.
    If the database was restarted after a document was modified but before it was indexed,
    the indexing process will just pick up from where it left off and complete the work.

  • Each index is assigned a dedicated thread, thus no indexing process can interfere with any other.
    By default, indexing threads start with a lower priority than request-processing threads. The indexing-thread priority can be set higher and RavenDB will update this at the operating system level.

  • Indexing can be throttled to delay indexing tasks by a pre-set time period.
    Throttling is helpful when sufficient server resources need to remain available for users while heavy-duty indexing tasks are due. See: Index Throttling

3. Indexed Data


  • The resulting output of the indexing process from 'step 2' above is also referred to as an 'Index'.

  • Index-Entries
    During the indexing process, an index-entry is created for each raw document that is processed.
    Usually a single index-entry is created per raw document, unless working with a fanout index.

  • Index-Fields and Terms
    Each index-entry contains the index-fields that were defined in the index definition.
    Each index-field contains terms that are generated from the data in the raw documents.

    The terms generated depend on the analyzer used, and they are the actual indexed values that are stored in the index.
    When querying the index, you can retrieve the original documents and filter the results based on these terms.

  • Stored Data
    In addition to the terms, some document fields can be stored directly in the index data.
    This allows for query results to be fetched from the index itself instead of loading the original document.

    Note: The full document is not stored in the index - only the document ID.
    Upon a query match, we load the document itself from the document storage.

Index Types

Indexes in RavenDB are split across the following multiple axes:

Auto Indexes -vs- Static Indexes


  • Auto Indexes:

    • Auto-indexes are created by the server.
    • When a query that has some filtering condition doesn't specify a specific index to be used,
      the server Query Optimizer will first analyze the query and search for an already existing Auto-index that can answer the query.
    • If there is no such index, the Query Optimizer creates on the fly an Auto-index that can answer this query and all previous queries on that collection.
    • When the new Auto-index has caught up, RavenDB cleans up all the old Auto-indexes that are now superseded by the new one.
  • Static Indexes:

    • Static-indexes are created by the user (database admin only) from the Studio or from the Client API.
    • A Static-index can be used to make any computation on the document fields.
      These computations run as a background process during indexing time, and not at query time.
      This way the indexed data is ready for queries, providing fast query results when querying the index.
    • The index shape (as defined in the index definition) and the shape of the source document don't have to be the same, as the indexed data can be a computed value.

Map Indexes -vs- Map-Reduce Indexes


  • Map Indexes:
    Map indexes are simple indexes.
    Contain one or more LINQ-based or JavaScript mapping functions indicating what should be indexed from the document, and how it should be indexed, as these functions allow you to compute the indexed value.

  • Map-Reduce Indexes:
    Map-Reduce indexes allow performing complex data aggregation.
    The Map stage is similar to a regular Map-Index, defining what data should be indexed.
    The Reduce stage operates on the Map results, specifying how the data should be grouped and aggregated.

Single-Collection Indexes -vs- Multi-Collection Indexes


  • Single-Collection Indexes:
    Index definition contains only one Map function defined on a specific collection.

  • Multi-Collection Indexes:
    Data from several collections can be indexed (each in a different Map) and the results are united in a single index.
    The only requirement is that all the Map definitions have the same output shape.

Field Configuration Options

Additional settings can be specified per field in the index-entry definition, configuring how the terms are indexed inside RavenDB. See Create Map Index to learn how to set these options in the Studio.

  • Full-Text Search
    The original field data is split and tokenized according to the selected analyzer. Learn more about analyzers here.

    • Suggestions - Allow finding similar results to the string in your query. i.e. Martin -> Martine.
    • Term Vector - Allow finding similar documents based on shared indexed terms.
  • Spatial
    Allow geographical querying on longitude and latitude values or WKT values provided from the document.
    Customize the spatial indexing strategy.
    Learn more in Indexing Spatial Data

  • Store Field
    Field can be stored within the indexed data.
    This allows retrieving the value from the indexed data at query time, instead of loading the original document.
    Learn more in Storing Data in Index

Modifying Index Definition

  • Only an index that is not set as 'Locked' can actually be modified.

  • When the index definition has changed in a way that invalidates the previous indexing results,
    the modification is handled in a side-by-side manner.
    e.g. A mapping function change will invalidate previous results, while a change in priority will not.

  • The original index is retained and is fully operable while the new index (with the new definition) is being built.
    Once the new index is up-to-date the original index is removed in favor of the new one.

  • See the example in Index List View - Side by Side.

  • RavenDB keeps a history of index revisions, allowing you to revert an index to any of its past revisions.
    The number of index revisions kept can be configured in the Server Configuration Options,
    or using the Database Settings view in the Studio.

Indexes in the Cluster

  • Index & Auto-Index creation is a cluster operation. It goes through the Raft protocol.
    Index creation will fail if the majority of the nodes in the cluster are not reachable.

  • Once an index is created against any node in the Database Group, RavenDB will make sure that its definition is replicated to all the database's nodes. The indexing process will occur separately on each node.

  • Note: The External Replication ongoing task does NOT replicate indexes.

  • In a multiple-nodes cluster, indexing can be configured to occur either in Rolling deployment Mode (one node at a time if machine resources are limited) or in Parallel mode (simultaneously on all nodes).

Indexing Errors

  • An error in indexing a document means that this particular document is not indexed and you will not see it in the query result.

  • An index is only allowed a certain failure rate, above which it is marked in an error state.
    An index in an error state cannot be queried and will return an immediate error.

  • See more in Index List View - Errors.