Querying Time Series: Overview & Syntax



Time series query capabilities

Time series query can -

  • Choose a range of time series entries to query from.
  • Filter time series entries by their tags, values and timestamps.
  • Aggregate time series entries into groups by a chosen time resolution,
    e.g. gather the prices of a stock that's been collected over the past two months to week-long groups.
    Entries can also be aggregated by their tags.
  • Select entries by various criteria, e.g. by the min and max values of each aggregated group,
    and project them to the client.
  • Calculate statistical measures: the percentile, slope, or standard deviation of a time series.

Server and client queries

Time series queries are executed by the server and their results are projected to the client,
so they require very little client computation resources.

  • The server runs time series queries using RQL.
  • Clients can phrase time series queries in raw RQL or using high level LINQ expressions.
    High level queries are translated to RQL by the client before sending them to the server for execution.

Dynamic and index queries

  • Dynamic queries:

    • Time series indexes are Not created automatically by the server when making a dynamic query.
    • Use dynamic queries when time series you query are not indexed,
      or when you prefer that RavenDB would choose an index automatically. See queries always use an index.
      E.g. -

    // Query for time series named "HeartRates" in employees hired after 1994
    from Employees as e
    where HiredAt > "1994-01-01"
    select timeseries(
        from HeartRates
    )
  • Index queries:

Scaling query results

  • Time series query results can be scaled, multiplied by some number. This doesn't change the values themselves, only the output of the query. Scaling can serve as a stage in a data processing pipeline, or just for the purposes of displaying the data in a more understandable format.

  • There are several use cases for scaling. For example, suppose your time series records the changing speeds of different vehicles as they travel through a city, with some data measured in miles per hour and others in kilometers per hour. Here, scaling can facilitate unit conversion.

  • Another use case involves the compression of time series data. Numbers with very high precision (i.e., many digits after the decimal point) are less compressible than numbers with low precision. Therefore, for efficient storage, you might want to change a value like 0.000018 to 18 when storing the data.
    Then, when querying the data, you can scale by 0.000001 to restore the original value.

  • Scaling is a part of both RQL and LINQ syntax:

    • In LINQ, use .Scale(<double>).
    • In RQL, use scale <double> in a time series query, and input your scaling factor as a double.

Example:

var query = session.Query<User>()
    .Select(p => RavenQuery.TimeSeries(p, "HeartRates")
        .Scale(10)
        .ToList())
    .ToList();

// The value in the query results is 10 times the value stored on the server
var scaledValue = query[0].Results[0].Values[0];
from Users
select timeseries(
    from HeartRates
    scale 10
)

RQL syntax

A typical time series query can start by locating the documents whose time series we want to query.
For example, we can query for employees above 30:

from Employees as e
where Birthday < '1994-01-01'

Then, you can query their time series entries using either of the following two equivalent syntaxes:


select timeseries

This syntax allows you to encapsulate your query's time series functionality in a select timeseries section.

// Query for entries from time series "HeartRates" for employees above 30
// ======================================================================

// This clause locates the documents whose time series we want to query:
from Employees as e 
where Birthday < '1994-01-01'
 
// Query the time series that belong to the matching documents:
select timeseries (   // The `select` clause defines the time series query.  
    from HeartRates   // The `from` keyword is used to specify the time series name to query.  
)

declare timeseries

This syntax allows you to declare a time series function (using declare timeseries) and call it from your query.
It introduces greater flexibility to your queries as you can, for example, pass arguments to the time series function.

Here is a query written in both syntaxes.
It first queries for users above 30. If they possess a time series named "HeartRates", it retrieves a range of its entries.


With Time Series Function Without Time Series Function
// declare the time series function:
declare timeseries ts(jogger) {
    from jogger.HeartRates 
    between 
       "2020-05-27T00:00:00.0000000Z"
      and 
       "2020-06-23T00:00:00.0000000Z"
}

from Users as jogger
where Age > 30
// call the time series function
select ts(jogger)
from Users as jogger
where Age > 30
select timeseries(
    from HeartRates 
    between 
       "2020-05-27T00:00:00.0000000Z"
      and 
       "2020-06-23T00:00:00.0000000Z")

Combine time series and custom functions

  • You can declare and use both time series functions and custom functions in a query.
    The custom functions can call the time series functions, pass them arguments, and use their results.

  • In the example below, a custom function (customFunc) is called by the query select clause to fetch and format a set of time series entries, which are then projected by the query.
    The time series function (tsQuery) is called to retrieve the matching time series entries.

  • The custom function returns a flat set of values rather than a nested array, to ease the projection of retrieved values.

  • Note the generated RQL in the second tab, where the custom function is translated to a custom JavaScript function.

var query = from user in session.Query<User>()

    // The custom function 
    let customFunc = new Func<IEnumerable<TimeSeriesEntry>, IEnumerable<ModifiedTimeSeriesEntry>>(
        entries =>
            entries.Select(e => new ModifiedTimeSeriesEntry
            {
                Timestamp = e.Timestamp,
                Value = e.Values.Max(),
                Tag = e.Tag ?? "none"
            }))

    // The time series query
    let tsQuery = RavenQuery.TimeSeries(user, "HeartRates")
        .Where(entry => entry.Values[0] > 100)
        .ToList()

    // Project query results
    select new
    {
        Name = user.Name,
        // Call the custom function
        TimeSeriesEntries = customFunc(tsQuery.Results)
    };

var queryResults = query.ToList();
// The time series function:
// =========================
declare timeseries tsQuery(user) {
    from user.HeartRates
    where (Values[0] > 100)
}

// The custom JavaScript function:
// ===============================
declare function customFunc(user) {
    var results = [];

    // Call the time series function to retrieve heart rate values for the user
    var r = tsQuery(user);

    // Prepare the results
    for(var i = 0 ; i < r.Results.length; i ++) {
        results.push({
            Timestamp: r.Results[i].Timestamp, 
            Value: r.Results[i].Values.reduce((a, b) => Raven_Max(a, b)),
            Tag: r.Results[i].Tag  ?? "none"})
    }
    return results;
}

// Query & project results:
// ========================
from "Users" as user
select
    user.Name,
    customFunc(user) as timeSeriesEntries // Call the custom JavaScript function

This is the custom ModifiedTimeSeriesEntry class that is used in the above LINQ sample:

private class ModifiedTimeSeriesEntry
{
    public DateTime Timestamp { get; set; }
    public double Value { get; set; }
    public string Tag { get; set; }
}

Use Studio to experiment

You can use the Studio to try the RQL samples provided in this article and test your own queries.

"Time Series Query in Studio"

Time Series Query in Studio