Indexes: Storing Data in Index


Once the tokenization and analysis process is completed, the resulting tokens created by the used analyzer are stored in the index.
By default, tokens saved in the index are available for searching, but their original field values are not stored.

Lucene allows you to store the original field text (before it is analyzed) as well.

Storing Data in Index

Lucene's original field text storage feature is exposed in the index definition object as the Storage property of the IndexFieldOptions.

When the original values are stored in the index, they become available for retrieval via projections.

public static class Employees_ByFirstAndLastName extends AbstractIndexCreationTask {
    public Employees_ByFirstAndLastName() {
        map =  "docs.Employees.Select(employee => new {" +
            "    FirstName = employee.FirstName," +
            "    LastName = employee.LastName" +
            "})";

        store("FirstName", FieldStorage.YES);
        store("LastName", FieldStorage.YES);
    }
}
IndexDefinition indexDefinition = new IndexDefinition();
indexDefinition.setName("Employees_ByFirstAndLastName");
indexDefinition.setMaps(Collections.singleton("docs.Employees.Select(employee => new {" +
    "    FirstName = employee.FirstName," +
    "    LastName = employee.LastName" +
    "})"));

java.util.Map<String, IndexFieldOptions> fields = new HashMap<>();
indexDefinition.setFields(fields);

IndexFieldOptions firstNameOptions = new IndexFieldOptions();
firstNameOptions.setStorage(FieldStorage.YES);
fields.put("FirstName", firstNameOptions);

IndexFieldOptions lastNameOptions = new IndexFieldOptions();
lastNameOptions.setStorage(FieldStorage.YES);
fields.put("LastName", lastNameOptions);

store
    .maintenance()
    .send(new PutIndexesOperation(indexDefinition));

The default Storage value for each field is FieldStorage.NO.
Keep in mind that storing fields will increase disk space usage.

If the projection requires only the fields that are stored, the document will not be loaded from the storage and the query results will be retrieved directly from the index.
This can increase query performance at the cost of disk space used.