Indexes: Storing Data in Index
Once the tokenization and analysis process is completed,
the resulting tokens created by the used analyzer are stored in the index.
By default, tokens saved in the index are available for searching, but their original
field values are not stored.
Lucene allows you to store the original field text (before it is analyzed) as well.
- In this page:
Storing Data in Index
Lucene's original field text storage feature is exposed in the index definition object as
the Storage
property of the IndexFieldOptions
.
When the original values are stored in the index, they become available for retrieval via projections.
public static class Employees_ByFirstAndLastName extends AbstractIndexCreationTask {
public Employees_ByFirstAndLastName() {
map = "docs.Employees.Select(employee => new {" +
" FirstName = employee.FirstName," +
" LastName = employee.LastName" +
"})";
store("FirstName", FieldStorage.YES);
store("LastName", FieldStorage.YES);
}
}
IndexDefinition indexDefinition = new IndexDefinition();
indexDefinition.setName("Employees_ByFirstAndLastName");
indexDefinition.setMaps(Collections.singleton("docs.Employees.Select(employee => new {" +
" FirstName = employee.FirstName," +
" LastName = employee.LastName" +
"})"));
java.util.Map<String, IndexFieldOptions> fields = new HashMap<>();
indexDefinition.setFields(fields);
IndexFieldOptions firstNameOptions = new IndexFieldOptions();
firstNameOptions.setStorage(FieldStorage.YES);
fields.put("FirstName", firstNameOptions);
IndexFieldOptions lastNameOptions = new IndexFieldOptions();
lastNameOptions.setStorage(FieldStorage.YES);
fields.put("LastName", lastNameOptions);
store
.maintenance()
.send(new PutIndexesOperation(indexDefinition));
The default Storage
value for each field is FieldStorage.NO
.
Keep in mind that storing fields will increase disk space usage.
If the projection requires only the fields that are stored, the document will
not be loaded from the storage and the query results will be retrieved directly from the index.
This can increase query performance at the cost of disk space used.