You are currently browsing legacy 4.2 version of documentation. Click here to switch to the newest 6.0 version.

We can help you with migration to the latest RavenDB

see on GitHub

Indexes: Overview

RavenDB uses Indexes in order to be able to answer queries about your documents.
Indexes allow for fast query results as the entire dataset is not re-scanned each and every time.
Indexes can created:
- From Studio
- From code - see Creating Indexes and Deploying Indexes
- Auto indexes are generated by the server - see Auto-Map Index Generation
In this page:

Indexes - The General Concept

In order to be able to return query results about your documents without scanning the entire dataset each and every time,
RavenDB uses Indexes.
Once defined, the index iterates over the documents, and for every field (document property) that is requested to be indexed,
a map is built between the terms derived from these fields and the actual documents that contain them.
A query operating on these fields ends up with a simple search from the queried terms to the list of documents that contain them.
After the first indexing run, the index will keep that map current without re-processing the entire index data -
only update the relevant details when a document update happens in the database.
Indexes are not stored in the 'document store' but have their own separate storage.
Indexes in RavenDB are split across multiple axes (see more below)
- Auto Indexes -vs- Static Indexes
- Map Indexes -vs- Map-Reduce Indexes
- Single-Collection Indexes -vs- Multi-Collection Indexes

Indexes - The Moving Parts

1. Index Definition

The index definition tells RavenDB how to index the data.
It specifies the fields to be indexed and how those fields should be indexed (i.e. allowing a full-text-search option).
These fields can be specified explicitly or defined dynamically supporting any document structure.
The index definition is created by the client (Static-Index), or by the server Query Optimizer(Auto-Index).
Note: Data from related documents can also be indexed using 'LoadDocument'. Learn more in Indexing Related Documents.

2. Indexing Process

Indexing is the process of indexing the data, iterating over the documents and creating a map
between the terms indexed and the actual documents that contain them.
Indexing is a background operation, it is scheduled to occur in an async manner for any document change.
e.g. A document write operation doesn't wait for the index to complete processing -
The write operation is completed as soon as the transaction is written to disk.
An index is considered Stale if it had not yet processed all of the data.
A query can request that results are returned only when the index is up-to-date.
A write operation can wait for the indexing process to complete before acknowledging the write
See: Understanding Eventual Consistency
The async indexing process works with hard resets, shutdowns and the like.
If the database was restarted after a document was modified but before it was indexed,
the indexing process will just pick up from where it left off and complete the work.
Each index is assigned a dedicated thread, thus no indexing process can interfere with any other.
By default, indexing-threads start with a lower priority than request-processing threads.
The indexing-thread priority can be set higher and RavenDB will update this at the operating system level.

3. Indexed Data

The resulting output of 'step 2' (the indexing process) is also referred to as an Index.
It is the indexed data on which queries can operate on to get documents result.
Note: The full document is not stored in the index - only the document ID.
Upon a query match, we load the document itself from the document storage.
Index Entry
Index-Entries are all of the document fields that are requested to be indexed, as defined in the index-definition.
Term
The index-entries values are broken into Terms according to the specified analyzer used in the index-definition.
Term is the actual indexed value that is stored in the index.
Stored Data
In addition to the Terms, some document fields can be stored directly in the index data.
This allows for query results to be fetched from the index itself instead of loading the original document.

Index Types

Indexes in RavenDB are split across the following multiple axes:

Auto Indexes -vs- Static Indexes

Auto Indexes:
When a Query doesn't specify a specific index to be used, the server Query Optimizer will first analyze the query and search for an already existing Auto-Index that can answer the query. If there is no such index, the Query Optimizer creates on the fly an Auto-Index that can answer this query and all previous queries on that collection.
When the new Auto-Index has caught up, RavenDB cleans up all the old Auto-Indexes that are now superseded by the new one.
Static Indexes:
Created by the user (database administrator only) from the Studio or from the Client API.
The index-shape (as defined in the index-definition) and the shape of the source document don't have to be the same,
as the indexed-data can be a computed value. These computations are run during the indexing-process and not at query time.

Map Indexes -vs- Map-Reduce Indexes

Map Indexes:
Map indexes are simple indexes.
Contain one or more LINQ-based mapping functions indicating what should be indexed from the document,
and how it should be indexed, as these functions allow you to compute the indexed value.
Map-Reduce Indexes:
Map-Reduce indexes allow performing complex aggregations of data.
The Map stage is similar to a regular Map-Index, defining what data should be indexed.
The Reduce stage operates on the Map results, specifying how the data should be grouped and aggregated.

Single-Collection Indexes -vs- Multi-Collection Indexes

Single-Collection Indexes:
Index definition contains only one Map function defined on a specific collection.
Multi-Collection Indexes:
Data from several collections can be indexed (each in a different Map) and the results are united in a single index.
The only requirement is that all the Maps definitions have the same output shape.

Field Configuration Options

Additional settings can be specified per field in the index-entry definition, configuring how the terms are indexed inside RavenDB.

Full Text Search
The original field data is split and tokenized according to the selected analyzer.
- Suggestions - allow finding similar results to the string in your query. i.e. Martin -> Martine.
- Term Vector - allow finding similar documents based on shared indexed terms.
- Indexing - Allow options such as searching for individual words inside the indexed terms or exact case-sensitive matches
- Learn more in: Analyzers
Spatial
Allow geographical querying on longitude and latitude values or WKT values provided from the document.
Customize the spatial indexing strategy.
Learn more in: Indexing Spatial Data
Store Field
Field can be stored within the indexed-data.
This allows retrieving the value from the indexed-data at query time, instead of loading the original document.
Learn more in: Storing Data in Index

Modifying Index Definition

Only an index that is not set as 'Locked' can actually be modified.
When the index-definition has changed in a way that invalidates the previous indexing results,
the modification is handled in a side-by-side manner.
e.g. A mapping function change will invalidate previous results, while a change in priority will not.
The original index is retained and is fully operable while the new index (with the new definition) is being built.
Once the new index is up-to-date the original index is removed in favor of the new one.
See example in Index List View - Side by Side.

Indexes in the Cluster

Index & Auto-Index creation is a cluster operation. It goes through the Raft protocol.
Index creation will fail if the majority of the nodes in the cluster is not reachable.
Once an index is created against any node in the Database Group, RavenDB will make sure that its definition is replicated to all the database's nodes. The indexing-process will occur separately on each node.
Note: The External Replication ongoing-task does NOT replicate indexes.

Indexing Errors

An error in indexing a document means that this particular document is not indexed and you will not see it in the query result.
An index is only allowed a certain failure rate, above which it is marked in an error state.
An index in an error state cannot be queried and will return an immediate error.
See more in Index List View - Errors.

see on GitHub

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

Indexes: Overview

Indexes - The General Concept

Indexes - The Moving Parts

1. Index Definition

2. Indexing Process

3. Indexed Data

Index Types

Auto Indexes -vs- Static Indexes

Map Indexes -vs- Map-Reduce Indexes

Single-Collection Indexes -vs- Multi-Collection Indexes

Field Configuration Options

Modifying Index Definition

Indexes in the Cluster

Indexing Errors

Related Articles

Indexes

Studio

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials