Auto (Dynamic) indexes are identified by the naming prefix Auto/
then the collection name and filtration terms requested in the query.
When RavenDB processes a query, it scans for an existing index that will properly answer the query.
If none exists, it automatically creates an index based on the query.
If there is an auto-index that partially answers the query, RavenDB's Query Optimizer may improve that index with the new query requirement,
then removes any newly-obsolete indexes.
The query optimizer analyzes the set of queries you make to your database and generates the optimal set of indexes to answer those queries.
Changes in your queries also trigger changes in the indexes on your database as they adjust to the new requirements.
Dynamic indexing automatically adapts to changes and optimizations in your application, thus increasing agility.
Map-reduce indexes: Shows how to create complex aggregations of data that is
stored and updated inside the index, improving querying performance.
Fanout indexes: Shows how to define indexes that output multiple entries per document.
Temporary prefix of index that is rebuilding after definition changes are saved.
When an existing index definition is changed, RavenDB uses the old version until the new version is completely built, then saves the old definition in the index history.
Until the new version completely replaces the old, the two are in a state called side-by-side indexing.
Editing an Index in the Studio
Click the following links to learn more about defining indexes via the Studio:
When expanding index tracks, zoom in on batches to see how long each stage took to process.
Some stages are very short. Zoom in further with the mouse scroller to see stages that happen quickly.
When an index track is expanded, we see 4-5 rows of colored stripes.
The top row is the entire indexing process.
The following rows are breakdowns of what happened in each stage above.
Hover over these colored stripes to see detailed statistics about each stage.
Details of Indexing Stages
Indexing Stages Statistics
Indexing (total batch process)
The amount of time this batch process took to complete.
This is an example of RavenDB building a new auto-index, thus it took ~ 295 ms to process 830 documents.
The second time this index was run, it took ~1ms because the index was already built.
To get a more accurate rate, we would need a larger sample size of documents.
The number of index entries that were scanned.
The number of index entries that the index returned from the data store.
The number of index entries that the index failed to process.
The number of index entries that the index succeeded to process.
Total size of the documents returned from the data store.
Average Document Size
The average size of each document that was returned.
Managed Allocation Size
Processed Data Speed
The speed at which the data was processed.
Document Processing Speed
The number of documents per second.
As the auto-index was built, it processed at a speed of ~2,814 documents per second.
The second time this index was run, it processed ~830,000 documents per second.
Again, to get a more accurate rate, we would need a larger sample size of documents.
Map stage/s (applies index definition/s)
The amount of time this batch process took to complete.
There are a number of possible batch status messages. They fall into two main categories.
No more documents to index
The batch managed to cover all of the documents needed.
(Name of the method used to create a batch stop)
There are a number of configurations that break up large batch processes into smaller batches to prevent exhausting system resources.
While these batch stops prevent system exhaustion, they also point to potential opportunities to optimize your indexes.
These situations are discussed in the section on common indexing issues
The amount of time it took to read or write the data to disk.
If this stage takes a long time after the index is already built, it may reveal a hardware problem.
Lucene stages show how long it took to store the information in the Lucene search engine.
Common Indexing Issues
Indexing can be a taxing operation on CPU resources.
There are a number of configurations that efficiently use
batch stops to break up huge batch processes into smaller batches to prevent exhausting resources.
If a configuration is specific to an index, it can be set in the Studio.
If it is a server-wide only configuration, it must be set in the server's settings.json.
While they prevent system exhaustion, batch stops also point to potential opportunities to optimize your indexes.
Batch stops break up processes into smaller batches when
Some indexes are responsible for a huge dataset and/or have very complex, demanding definitions.
To prevent resource exhaustion, RavenDB can break up large batches into smaller ones.
You can configure batch stops with the following methods:
You can upgrade your hardware, divide the work onto more machines in a cluster, and/or optimize your indexes.
Until then, there are a number of indexing configurations that you can configure
to break up processes into smaller batches.
Your indexing process will continue until it is finished, but will be broken up into smaller batches and continue when enough CPU credits accumulate.
This can happen on basic-level cloud instances.
Limit concurrent index processes -
RavenDB can handle multiple index processes at the same time, but if there are too many, it will exhaust the system resources and cause a
noticeable slow-down. The Indexing.MaxNumberOfConcurrentlyRunningIndexes method enables you to have many indexes without exhausting resources by allowing you to set
the number of concurrent index processes.
Indexing referenced/related data
can be useful (even in a NoSql database) when developers need to pull information from different documents into an indexing process. The LoadDocument method
creates a relationship between the two documents and ensures that whenever the referenced document is updated, the referencing documents will be re-indexed to
stay current with the new details.
LoadDocument is a useful feature, but problems arise if a large number of documents reference a single document (or a small set of them) that is frequently changed.
If frequent changes are made to this document, all the documents referencing it will also need to be reindexed.
In other words, the amount of work that an index has to do because
of a single document frequently changing can be extremely large and may cause delays in indexing.
The high IO demands in this situation can then cause further problems such as longer request duration and cluster instability.
Sometimes, LoadDocument misuse is caused by trying to apply relational modeling approaches to document-based databases.
If you're accustomed to relational data modeling, you can learn about effective document modeling in the "Inside RavenDB" book.