Indexes: Term Vectors


  • A Term Vector is a representation of a text document as a vector of identifiers.
    Lucene indexes can contain term vectors for documents they index.
  • Term vectors can be used for various purposes, including similarity searches, information filtering and retrieval, and indexing.
    A book's index, for example, may have term vector enabled on the book's subject field, to be able to use this field to search for books with similar subjects.
  • RavenDB features like MoreLikeThis leverage stored term vectors to accomplish their goals.

  • In this page:


Creating an index and enabling Term Vectors on a field

Indexes that include term vectors can be created and configured using the API or Studio.

Using the API

To create an index and enable Term Vectors on a specific field, we can -

A. Create an index using the AbstractIndexCreationTask, and specify the term vectors there.
B. Or, we can define our term vectors in the IndexDefinition (directly or using the IndexDefinitionBuilder).

public class BlogPosts_ByTagsAndContent : AbstractIndexCreationTask<BlogPost>
{
    public BlogPosts_ByTagsAndContent()
    {
        Map = users => from doc in users
                       select new
                       {
                           doc.Tags,
                           doc.Content
                       };

        Indexes.Add(x => x.Content, FieldIndexing.Search);
        TermVectors.Add(x => x.Content, FieldTermVector.WithPositionsAndOffsets);
    }
}
IndexDefinitionBuilder<BlogPost> indexDefinitionBuilder =
    new IndexDefinitionBuilder<BlogPost>("BlogPosts/ByTagsAndContent")
    {
        Map = users => from doc in users
                       select new
                       {
                           doc.Tags,
                           doc.Content
                       },
        Indexes =
        {
            { x => x.Content, FieldIndexing.Search }
        },
        TermVectors =
        {
            { x => x.Content, FieldTermVector.WithPositionsAndOffsets }
        }
    };

IndexDefinition indexDefinition = indexDefinitionBuilder
    .ToIndexDefinition(store.Conventions);

store.Maintenance.Send(new PutIndexesOperation(indexDefinition));

Available Term Vector options include:

public enum FieldTermVector
{
    /// <summary>
    /// Do not store term vectors
    /// </summary>
    No,

    /// <summary>
    /// Store the term vectors of each document. A term vector is a list of the document's
    /// terms and their number of occurrences in that document.
    /// </summary>
    Yes,

    /// <summary>
    /// Store the term vector + token position information
    /// </summary>
    WithPositions,

    /// <summary>
    /// Store the term vector + Token offset information
    /// </summary>
    WithOffsets,

    /// <summary>
    /// Store the term vector + Token position and offset information
    /// </summary>
    WithPositionsAndOffsets
}

Learn which Lucene API methods and constants are available here.

Using Studio

Let's use as an example one of Studio's sample indexes, Product/Search, that has term vector enabled on its Name field so a feature like MoreLikeThis can use this fiels to select a product and find products similar to it.

Term vector enabled on index field

Term vector enabled on index field

We can now use a query like:

from index 'Product/Search' 
where morelikethis(id() = 'products/7-A')