Indexes: Term Vectors

Term Vector is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the feature like MoreLikeThis is leveraging the term vectors to accomplish its purposes.

To create an index and enable Term Vectors on a specific field we can create an index using the AbstractIndexCreationTask, then specify the term vectors there, or define our term vectors in the IndexDefinition (directly or using the IndexDefinitionBuilder).

public static class BlogPosts_ByTagsAndContent extends AbstractIndexCreationTask {
    public BlogPosts_ByTagsAndContent() {
        map = "docs.Posts.Select(post => new { " +
            "    Tags = post.Tags, " +
            "    Content = post.Content " +
            "})";

        index("Content", FieldIndexing.SEARCH);
        termVector("Content", FieldTermVector.WITH_POSITIONS_AND_OFFSETS);
    }
}
IndexDefinitionBuilder builder = new IndexDefinitionBuilder("BlogPosts/ByTagsAndContent");
builder.setMap("docs.Posts.Select(post => new { " +
    "    Tags = post.Tags, " +
    "    Content = post.Content " +
    "})");

builder.getIndexesStrings().put("Content", FieldIndexing.SEARCH);
builder.getTermVectorsStrings().put("Content", FieldTermVector.WITH_POSITIONS_AND_OFFSETS);

IndexDefinition indexDefinition = builder.toIndexDefinition(store.getConventions());

store.maintenance().send(new PutIndexesOperation(indexDefinition));

The available Term Vector options are:

public enum FieldTermVector {
    /**
     * Do not store term vectors
     */
    NO,

    /**
     * Store the term vectors of each document. A term vector is a list of the document's
     * terms and their number of occurrences in that document.
     */
    YES,
    /**
     * Store the term vector + token position information
     */
    WITH_POSITIONS,
    /**
     * Store the term vector + Token offset information
     */
    WITH_OFFSETS,

    /**
     * Store the term vector + Token position and offset information
     */
    WITH_POSITIONS_AND_OFFSETS
}