Term Vectors

Term Vector is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the features like MoreLikeThis or text highlighting are leveraging the term vectors to accomplish their purposes.

To create an index and enable Term Vectors on a specific field we can create an index using the AbstractIndexCreationTask, then specify the term vectors there, or define our term vectors in the IndexDefinition (directly or using the IndexDefinitionBuilder).

public class BlogPosts_ByTagsAndContent : AbstractIndexCreationTask<BlogPost>
{
	public BlogPosts_ByTagsAndContent()
	{
		Map = users => from doc in users
				select new
				{
					doc.Tags,
					doc.Content
				};

		Indexes.Add(x => x.Content, FieldIndexing.Analyzed);
		TermVectors.Add(x => x.Content, FieldTermVector.WithPositionsAndOffsets);
	}
}
store
	.DatabaseCommands
	.PutIndex(
		"BlogPosts/ByTagsAndContent",
		new IndexDefinitionBuilder<BlogPost>
		{
			Map = users => from doc in users
					select new
					{
						doc.Tags,
						doc.Content
					},
			Indexes =
			{
				{ x => x.Content, FieldIndexing.Analyzed }
			},
			TermVectors =
			{
				{ x => x.Content, FieldTermVector.WithPositionsAndOffsets }
			}
		});

The available Term Vector options are:

public enum FieldTermVector
{
	/// <summary>
	/// Do not store term vectors
	/// </summary>
	No,

	/// <summary>
	/// Store the term vectors of each document. A term vector is a list of the document's
	/// terms and their number of occurrences in that document.
	/// </summary>
	Yes,

	/// <summary>
	/// Store the term vector + token position information
	/// </summary>
	WithPositions,

	/// <summary>
	/// Store the term vector + Token offset information
	/// </summary>
	WithOffsets,

	/// <summary>
	/// Store the term vector + Token position and offset information
	/// </summary>
	WithPositionsAndOffsets
}