Exploration Queries


filter

In Python, exploration queries can be applied via RQL using the filter keyword.

The added filtering is parsed and executed by RavenDB's Javascript engine.

The provided filtering operations resemble those implemented by where and can be further enhanced by Javascript functions of your own.
Read here about creating and using your own Javascript function in your filters.

When should exploration queries be used

filter can be applied to a Collection query, like in:

from Employees as e 
filter e.Address.Country = 'USA'

it can also be applied to queries handled by an index, e.g. -

// in a dynamic query via an auto-index  
from Employees as e 
where e.Title = 'Sales Representative'  
filter e.Address.Country = 'USA'

// in a query that uses an index explicitly  
from index 'Orders/ByCompany' 
filter Count > 10

Both in a collection query and in a query handled by an index, the entire retrieved dataset is scanned and filtered.
This helps understand when exploration queries should be used, why a Limit should be set for the number of filtered records, and when where should be preferred:

When to use

Use filter for an ad-hoc exploration of the retrieved dataset, that matches no existing index and is not expected to be repeated much.

  • You gain the ability to filter post-query results on the server side, for both collection queries and when an index was used.
  • The dataset will be filtered without creating an unrequired index that the cluster would continue updating from now on.

Limit the query, and prefer where for recurring queries

Be aware that when a large dataset is retrieved, like the whole collection in the case of a collection query, exploring it all using filter would tax the server in memory and CPU usage while it checks the filter condition for each query result, and cost the user a substantial waiting time. Therefore -

  • Limit the number of records that an exploration query filters, e.g.:
    from Employees as e 
    filter e.Address.Country = 'USA'
    filter_limit 500 // limit the number of filtered records
  • Use where rather than filter for recurring filtering.
    where will use an index, creating it if necessary, to accelerate the filtering in subsequent queries.

Syntax

  • In C#, for example, filter can be applied using code from the Query or DocumentQueryAPI.
    There is no such API implementation under python, leaving RQL as the only way to perform exploration queries.

  • RQL

    • In an RQL query, use:
      The filter keyword, followed by the filtering condition.
      The filter_limit option, followed by the max number of records to filter.
    • E.g. -
      from Employees as e 
      where e.Title = 'Sales Representative'  
      filter e.Address.Country = 'USA' // filter the retrieved dataset
      filter_limit 500 // limit the number of filter records

Usage examples

With collection queries

Use filter with a collection query to scan and filter the entire collection.

result = session.advanced.raw_query(
    "from Employees as e " "filter e.Address.Country = 'USA' " "filter_limit 500", Employee
).single()

Filtering a sizable collection will burden the server and prolong user waiting time.
Set a filter_limit to restrict the number of filtered records.


With queries that use an index

Use filter after a where clause to filter the results retrieved by an index query.

emp = (
    session.advanced.raw_query(
        "from Employees as e "
        "where e.Title = $title "
        "filter e.Address.Country = $country "
        "filter_limit $limit",
        Employee,
    )
    .add_parameter("title", "Sales Representative")
    .add_parameter("country", "USA")
    .add_parameter("limit", 500)
    .single()
)

With projections

The filtered results can be projected using select, like those of any other query.

emp3 = session.advanced.raw_query(
    "from Employees as e "
    "filter startsWith(e.FirstName, 'A') "
    "select { FullName: e.FirstName + ' ' + e.LastName }",
    Employee,
)

With user-defined JavaScript functions (declare)

You can define a Javascript function as part of your query using the declare keyword, and use it as part of your filter condition to freely adapt the filtering to your needs.

Here is a simple example:

// declare a Javascript function
declare function titlePrefix(r, prefix) 
{ 
    // Add whatever filtering capabilities you like
    return r.Title.startsWith(prefix)
} 

from Employees as e 

// Filter using the function you've declared
filter titlePrefix(e, $prefix)
filter_limit 100