Indexes: Term Vectors

Term Vector is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the feature like MoreLikeThis is leveraging the term vectors to accomplish its purposes.

To create an index and enable Term Vectors on a specific field we can create an index using the AbstractIndexCreationTask, then specify the term vectors there, or define our term vectors in the IndexDefinition (directly or using the IndexDefinitionBuilder).

public class BlogPosts_ByTagsAndContent : AbstractIndexCreationTask<BlogPost>
{
    public BlogPosts_ByTagsAndContent()
    {
        Map = users => from doc in users
                       select new
                       {
                           doc.Tags,
                           doc.Content
                       };

        Indexes.Add(x => x.Content, FieldIndexing.Search);
        TermVectors.Add(x => x.Content, FieldTermVector.WithPositionsAndOffsets);
    }
}
IndexDefinitionBuilder<BlogPost> indexDefinitionBuilder =
    new IndexDefinitionBuilder<BlogPost>("BlogPosts/ByTagsAndContent")
    {
        Map = users => from doc in users
                       select new
                       {
                           doc.Tags,
                           doc.Content
                       },
        Indexes =
        {
            { x => x.Content, FieldIndexing.Search }
        },
        TermVectors =
        {
            { x => x.Content, FieldTermVector.WithPositionsAndOffsets }
        }
    };

IndexDefinition indexDefinition = indexDefinitionBuilder
    .ToIndexDefinition(store.Conventions);

store.Maintenance.Send(new PutIndexesOperation(indexDefinition));

The available Term Vector options are:

public enum FieldTermVector
{
    /// <summary>
    /// Do not store term vectors
    /// </summary>
    No,

    /// <summary>
    /// Store the term vectors of each document. A term vector is a list of the document's
    /// terms and their number of occurrences in that document.
    /// </summary>
    Yes,

    /// <summary>
    /// Store the term vector + token position information
    /// </summary>
    WithPositions,

    /// <summary>
    /// Store the term vector + Token offset information
    /// </summary>
    WithOffsets,

    /// <summary>
    /// Store the term vector + Token position and offset information
    /// </summary>
    WithPositionsAndOffsets
}