Search Engine: Corax
-
Corax is RavenDB's native search engine, introduced in RavenDB version 6.0 as an in-house searching alternative for Lucene.
Lucene remains available as well, you can use either search engine as you prefer. -
The main role of the database's search engine is to satisfy incoming queries.
In RavenDB, the search engine achieves this by handling each query via an index.
If no relevant index exists, the search engine will create one automatically.The search engine is the main "moving part" of the indexing mechanism, which processes and indexes documents by index definitions.
-
The search engine supports both Auto and Static indexing and can be selected separately for each.
-
The search engine can be selected per server, per database, and per index (for static indexes only).
-
In this page:
Selecting the search engine
-
You can select your preferred search engine in several scopes:
- Server-wide, selecting which search engine will be used by all the databases hosted by this server.
- Per database, overriding server-wide settings for a specific database.
- Per index,
overriding server-wide and per-database settings.
Per-index settings are available only for static indexes.
Note that the search engine is selected for new indexes only.
These settings do not apply to existing indexes. -
These configuration options are available:
- Indexing.Auto.SearchEngineType
Use this option to select the search engine (eitherLucene
orCorax
) for auto indexes.
The search engine can be selected server-wide or per database. - Indexing.Static.SearchEngineType
Use this option to select the search engine (eitherLucene
orCorax
) for static indexes.
The search engine can be selected server-wide, per database, or per index. - Read about additional Corax configuration options here.
- Indexing.Auto.SearchEngineType
Select search engine: Server wide
Select the search engine for all the databases hosted by a server
by modifying the server's settings.json file.
E.g. -
{
"Indexing.Auto.SearchEngineType": "Corax"
"Indexing.Static.SearchEngineType": "Corax"
}
You must restart the server for the new settings to be read and applied.
Selecting a new search engine will change the search engine only for indexes created from now on.
E.g., If my configuration has been "Indexing.Static.SearchEngineType": "Corax"
until now and I now changed it to "Indexing.Static.SearchEngineType": "Lucene"
,
static indexes created from now on will use Lucene, but static indexes created
while Corax was selected will continue using Corax.
After selecting a new search engine using the above options, change the search
engine used by an existing index by resetting
the index.
Select search engine: Per database
To select the search engine that the database would use, modify the relevant Database Record settings. You can easily do this via Studio:
-
Open Studio's Database Settings page, and enter
SearchEngine
in the search bar to find the search engine settings.
ClickEdit
to modify the default search engine.Database Settings
-
Select your preferred search engine for Auto and Static indexes.
Corax Database Options
-
To apply the new settings either disable and re-enable the database or restart the server.
Default Search Engine
Select search engine: Per index
You can also select the search engine that would be used by a specific index, overriding any per-database and per-server settings.
Select index search engine via studio:
-
Indexes-List-View > Edit Index Definition
Open Studio's Index List view and select the index whose search engine you want to set.Index Definition
- Open the index Configuration tab.
- Select the search engine you prefer for this index.
-
The indexes list view will show the changed configuration.
Search Engine Changed
Select index search engine using code
While defining an index using the API, use the SearchEngineType
property to select the search engine that would run the index.
Available values: SearchEngineType.Lucene
, SearchEngineType.Corax
.
- You can pass the search engine type you prefer:
// Set search engine type while creating the index new Product_ByAvailability(SearchEngineType.Corax).Execute(store);
- And set it in the index definition:
private class Product_ByAvailability : AbstractIndexCreationTask<Product> { public Product_ByAvailability(SearchEngineType type) { // Any Map/Reduce segments here Map = products => from p in products select new { p.Name, p.Brand }; // The preferred search engine type SearchEngineType = type; } }
Unsupported features
The below features are currently not supported by Corax.
Unsupported during indexing:
- Setting a boost factor on an index-field is not supported.
Note that boosting the whole index-entry IS supported. - Indexing WKT shapes is not supported.
Note that indexing spatial points IS supported. - Custom analyzers
- Custom Sorters
Unsupported while querying:
Complex JSON properties:
Complex JSON properties cannot currently be indexed and searched by Corax.
Read more about this below.
Unsupported WHERE
methods/terms:
Unimplemented methods
Trying to use Corax with an unimplemented method (see
Unsupported Features above)
will generate a NotSupportedInCoraxException
exception and end the search.
E.g. -
The following query uses the intersect
method, which is currently not supported by Corax.
from index 'Orders/ByCompany'
where intersect(Count > 10, Total > 3)
If you set Corax as the search engine for the Orders/ByCompany
index
used by the above query, running the query will generate the following
exception and the search will stop.
Handling of complex JSON objects
To avoid unnecessary resource usage, the content of complex JSON properties is not indexed by RavenDB.
See below
how auto and static indexes handle such fields.
Lucene's approach of indexing complex fields as JSON strings usually makes no
sense, and is not supported by Corax.
Consider, for example, the following orders
document:
{
"Company": "companies/27-A",
"Employee": "employees/2-A",
"ShipTo": {
"City": "Torino",
"Country": "Italy",
"Location": {
"Latitude": 45.0907661,
"Longitude": 7.687425699999999
}
}
}
As Location
contains a list of key/value pairs rather than a simple numeric value or a string,
Corax will not index its contents (see here
what will be indexes).
There are several ways to handle the indexing of complex JSON objects:
1. Index a simple property contained in the complex field
Index one of the simple key/value properties stored within the nested object.
In the Location
field, for example, Location's Latitude
and Longitude
.
can serve us this way:
from order in docs.Orders
select new
{
Latitude = order.ShipTo.Location.Latitude,
Longitude = order.ShipTo.Location.Longitude
}
2. Index the document using lucene
As long as Corax doesn't index complex JSON objects, you can always select Lucene as your search engine when you need to index nested properties.
3. Revise index definition and fields usage
As shown above,
indexing a whole complex field is rarely needed, and users would typically
index and search only the simple properties such a field contains.
Queries may sometimes need, however, to project the content of an entire
complex field.
When this is the case, you can revise the index definition (see below) to
disable the indexing of the complex field but store its content so
projection queries
would be able to project it.
Content we retrieve from the database and store in indexes becomes available for projection and will be henceforth retrieved directly from the indexes, accelerating its retrieval at the expense of indexes storage space.
-
To store a field's content and disable its indexing via Studio:
Disable indexing of a Nested Field
- Open the index definition's Fields tab.
- Click Add Field to specify what field Corax shouldn't index.
- Enter the name of the field Corax should not index.
- Select Yes to Store the field's content
- Select No to disable the field's indexing
-
To store a field's content and disable its indexing using Code:
private class Order_ByLocation : AbstractIndexCreationTask<Order> { public Order_ByLocation(SearchEngineType type) { Map = orders => from o in orders select new { o.ShipTo.Location }; SearchEngineType = type; // Disable indexing for this field Index("Location", FieldIndexing.No); // Store the field's content // (this is mandatory if the field's indexing is disabled) Store("Location", FieldStorage.Yes); } }
4. Turn the complex property into a string
You can handle the complex property as a string.
from order in docs.Orders
select new
{
// This will fail for the above document when using Corax
Location = order.ShipTo.Location
}
from order in docs.Orders
select new
{
// .ToString() will convert the data to a string in JSON format (same as using JsonConvert.Serialize())
Location = order.ShipTo.Location.ToString()
}
Serializing all the properties of a complex property into a single string,
including names, values, brackets, and so on, can be used as a last resort
to produce a string that doesn't make a good feed for analyzers and is not
commonly used for searches.
It does, however, make sense in some cases to project such a string.
If Corax encounters a complex property while indexing:
Auto and Static indexes handle complex fields differently.
New and Old static indexes also handle complex fields differently.
-
Auto Index
An auto index will replace a complex field with aJSON_VALUE
string.
This will allow basic queries over the field, like checking if it exists usingField == null
orexists(Field)
.- Corax will also raise a complex-field alert:
We have detected a complex field in an auto index. To avoid higher resources usage when processing JSON objects, the values of these fields will be replaced with JSON_VALUE. Please consider querying on individual fields of that object or using a static index.
- Corax will also raise a complex-field alert:
-
New static index (created or reset on RavenDB
6.2.x
and on)
The index will behave as determined by the Indexing.Corax.Static.ComplexFieldIndexingBehavior configuration option.- If
ComplexFieldIndexingBehavior
is set toThrow
-
Corax will throw aNotSupportedInCoraxException
exception with this message:
The value of `{fieldName}` field is a complex object. Typically a complex field is not intended to be indexed as a whole hence indexing it as a text isn't supported in Corax. The field is supposed to have 'Indexing' option set to 'No' (note that you can still store it and use it in projections). Alternatively you can switch 'Indexing.Corax.Static.ComplexFieldIndexingBehavior' configuration option from 'Throw' to 'Skip' to disable the indexing of all complex fields in the index or globally for all indexes (index reset is required). If you really need to use this field for searching purposes, you have to call ToString() on the field value in the index definition. Although it's recommended to index individual fields of this complex object. Read more at: https://ravendb.net/l/OB9XW4/6.2
- If
ComplexFieldIndexingBehavior
is set toSkip
-
Corax will skip indexing the complex field without throwing an exception.
- If
-
Old static index (created using RavenDB
6.0.x
or older)
If the index doesn't explicitly relate to the complex field, Corax will automatically disable indexing for this field by defining Indexing: No for it as shown above.- If the Indexing flag is set to anything but "no" -
Corax will throw aNotSupportedInCoraxException
exception.
As disabling indexing for this field will prevent additional attempts to index its values,
the exception will be thrown just once.
- If the Indexing flag is set to anything but "no" -
Compound fields
This feature should be applied to very large datasets and specific queries.
It is meant for experts only.
A compound field is a Corax index field comprised of 2 simple data elements.
A compound field can currently be composed of exactly 2 elements.
Expert users can define compound fields to optimize data retrieval: data stored in a compound
field is sorted as requested by the user, and would later on be retrieved in this order
with extreme efficiency.
Compound fields can also be used to unify simple data elements in cohesive units to
make the index more readable.
- Adding a Compound Field
In an index definition, add a compound field using theCompoundField
method.
Pass the method simple data elements in the order by which you want them to be sorted. -
Example
An example of an index definition with a compound field can be:
private class Product_Location : AbstractIndexCreationTask<Product> { public Product_Location() { Map = products => from p in products select new { p.Brand, p.Location }; // Add a compound field CompoundField(x => x.Brand, x => x.Location); } }
The query that uses the indexed data will look no different than if the index included no compound field, but produce the results much faster.
using (var s = store.OpenSession()) { // Use the internal optimization previously created by the added compound field var products = s.Query<Product, Product_Location>() .Where(x => x.Brand == "RunningShoes") .OrderBy(x => x.Location) .ToList(); }
from Products where Brand = "RunningShoes" order by Location
Limits
-
Corax can create and use indexes of more than
int.MaxValue
(2,147,483,647) documents.
To match this capacity, queries over Corax indexes can skip a number of results that exceedsint.MaxValue
and take documents from this location. -
The maximum number of documents that can be projected by a query (using either Corax or Lucene) is
int.MaxValue
(2,147,483,647).
Configuration options
Corax configuration options include:
-
Indexing.Auto.SearchEngineType
Select the search engine for Auto indexes. -
Indexing.Static.SearchEngineType
Select the search engine for Static indexes. -
Indexing.Corax.IncludeDocumentScore
Choose whether to include the score value in document metadata when sorting by score.
Disabling this option can improve query performance.
-
Indexing.Corax.IncludeSpatialDistance
Choose whether to include spatial information in document metadata when sorting by distance.
Disabling this option can improve query performance.
-
Indexing.Corax.MaxMemoizationSizeInMb
The maximum amount of memory that Corax can use for a memoization clause during query processing.
Expert Level Configuration
Please configure this option only if you are an expert.
-
Indexing.Corax.DocumentsLimitForCompressionDictionaryCreation
Set the maximum number of documents that will be used for the training of a Corax index during dictionary creation.
Training will stop when it reaches this limit. -
Indexing.Corax.MaxAllocationsAtDictionaryTrainingInMb
Set the maximum amount of memory (in MB) that will be allocated for the training of a Corax index during dictionary creation.
Training will stop when it reaches this limit. -
Indexing.Corax.Static.ComplexFieldIndexingBehavior
Choose how to react when a static Corax index is requested to index a complex JSON object.
Index training: Compression dictionaries
When creating Corax indexes, RavenDB analyzes index contents and trains compression dictionaries for much higher storage and execution efficiency.
- The larger the collection, the longer the training process will take.
The index, however, will become more efficient in terms of resource usage. - The training process can take from a few seconds to up to a minute in multiterabyte collections.
- The IO speed of the storage system also affects the training time.
Here are some additional things to keep in mind about Corax indexes compression dictionaries:
- Compression dictionaries are used to store index terms more efficiently.
This can significantly reduce the size of the index, which can improve performance. - The training process is only performed once, when the index is created.
- The compression dictionaries are stored with the index and are used for all subsequent operations (indexing and querying).
- The benefits of compression dictionaries are most pronounced for large collections.
Training stops when it reaches either the number of documents threshold (100,000 docs by default) or the amount of memory threshold (up to 2GB). Both thresholds are configurable.
- If upon creation there are less than 10,000 documents in the involved collections,
it may make sense to manually force an index reset after reaching
100,000
documents to force retraining.
Indexes are replaced in a side-by-side manner: existing indexes would continue running until the new ones are created, to avoid any interruption to existing queries.
Corax and the Test Index Interface
Corax indexes will not train compression dictionaries if they are created in the Test Index interface, because the testing interface is designed for indexing prototyping and the training process will add unnecessary overhead.