Exploration Queries
-
Exploration Queries form an additional layer of filtering that can be applied to a dataset after its retrieval by raw_rql, while the dataset is still held by the server.
-
The retrieved dataset is scanned and filtered without requiring or creating an index, providing a way to conduct one-time explorations without creating an index that would have to be maintained by the cluster.
-
You can filter the datasets retrieved by both Index queries and Collection queries.
-
Exploration queries need to be used with caution since scanning and filtering all the data retrieved by a query cost substantial server resources and user waiting time when large datasets are handled.
We recommend that you -
- Limit the number of records that an exploration query filters.
- Use where in recurring queries, so the query would use an index.
-
In this page:
filter
In Python, exploration queries can be applied via RQL using the filter
keyword.
The added filtering is parsed and executed by RavenDB's Javascript engine.
The provided filtering operations resemble those implemented by
where and can be further enhanced
by Javascript functions of your own.
Read here
about creating and using your own Javascript function in your filters.
When should exploration queries be used
filter
can be applied to a Collection query, like in:
from Employees as e
filter e.Address.Country = 'USA'
it can also be applied to queries handled by an index, e.g. -
// in a dynamic query via an auto-index
from Employees as e
where e.Title = 'Sales Representative'
filter e.Address.Country = 'USA'
// in a query that uses an index explicitly
from index 'Orders/ByCompany'
filter Count > 10
Both in a collection query and in a query handled by an index, the entire retrieved
dataset is scanned and filtered.
This helps understand when exploration queries should be used, why a Limit
should be set for the number of filtered records, and when where
should
be preferred:
When to use
Use filter
for an ad-hoc exploration of the retrieved dataset, that matches
no existing index and is not expected to be repeated much.
- You gain the ability to filter post-query results on the server side, for both collection queries and when an index was used.
- The dataset will be filtered without creating an unrequired index that the cluster
would continue updating from now on.
Limit the query, and prefer where
for recurring queries
Be aware that when a large dataset is retrieved, like the whole collection in
the case of a collection query, exploring it all using filter
would tax the server
in memory and CPU usage while it checks the filter condition for each query result,
and cost the user a substantial waiting time. Therefore -
- Limit the number of records that an exploration query filters, e.g.:
from Employees as e filter e.Address.Country = 'USA' filter_limit 500 // limit the number of filtered records
- Use where rather than
filter
for recurring filtering.
where
will use an index, creating it if necessary, to accelerate the filtering in subsequent queries.
Syntax
-
In C#, for example,
filter
can be applied using code from theQuery
orDocumentQuery
API.
There is no such API implementation under python, leavingRQL
as the only way to perform exploration queries. -
RQL
- In an RQL query, use:
Thefilter
keyword, followed by the filtering condition.
Thefilter_limit
option, followed by the max number of records to filter. - E.g. -
from Employees as e where e.Title = 'Sales Representative' filter e.Address.Country = 'USA' // filter the retrieved dataset filter_limit 500 // limit the number of filter records
- In an RQL query, use:
Usage examples
With collection queries
Use filter
with a collection query to scan and filter the entire collection.
result = session.advanced.raw_query(
"from Employees as e " "filter e.Address.Country = 'USA' " "filter_limit 500", Employee
).single()
Filtering a sizable collection will burden the server and prolong user waiting time.
Set a filter_limit
to restrict the number of filtered records.
With queries that use an index
Use filter
after a where
clause to filter the results retrieved by an index query.
emp = (
session.advanced.raw_query(
"from Employees as e "
"where e.Title = $title "
"filter e.Address.Country = $country "
"filter_limit $limit",
Employee,
)
.add_parameter("title", "Sales Representative")
.add_parameter("country", "USA")
.add_parameter("limit", 500)
.single()
)
With projections
The filtered results can be projected using select
, like those of any other query.
emp3 = session.advanced.raw_query(
"from Employees as e "
"filter startsWith(e.FirstName, 'A') "
"select { FullName: e.FirstName + ' ' + e.LastName }",
Employee,
)
With user-defined JavaScript functions (declare
)
You can define a Javascript function as part of your query using the
declare keyword, and
use it as part of your filter
condition to freely adapt the filtering
to your needs.
Here is a simple example:
// declare a Javascript function
declare function titlePrefix(r, prefix)
{
// Add whatever filtering capabilities you like
return r.Title.startsWith(prefix)
}
from Employees as e
// Filter using the function you've declared
filter titlePrefix(e, $prefix)
filter_limit 100