Indexing Time Series



Time series indexes vs Document indexes

Auto-Indexes:

  • Time series index:
    Dynamic time series indexes are Not created in response to queries.

  • Document index:
    Auto-indexes are created in response to dynamic queries.


Data source:

  • Time series index:

    • Time series indexes process segments that contain time series entries.
      The entries are indexed through the segment they are stored in, for example, using a LINQ syntax that resembles this one:

    from segment in timeseries
    from entry in segment
    ...
    • The following items can be indexed per index-entry in a time series index:
      • Values & timestamp of a time series entry
      • The entry tag
      • Content from a document referenced by the tag
      • Properties of the containing segment
  • Document index:

    • The index processes fields from your JSON documents.
      Documents are indexed through the collection they belong to, for example, using this LINQ syntax:

    from employee in employees
    ...

Query results:

  • Time series index:
    When querying a time series index, each result item corresponds to the type defined by the index-entry in the index definition, (unless results are projected). The documents themselves are not returned.

  • Document index:
    The resulting objects are the document entities (unless results are projected).

Ways to create a time series index

There are two main ways to create a time series index:

  1. Create a class that inherits from one of the following abstract index creation task classes:

    • AbstractTimeSeriesIndexCreationTask
      for map and map-reduce time series indexes.
    • AbstractMultiMapTimeSeriesIndexCreationTask
      for multi-map time series indexes.
    • AbstractJavaScriptTimeSeriesIndexCreationTask
      for static javascript indexes.
  2. Deploy a time series index definition via PutIndexesOperation:

    • Create a TimeSeriesIndexDefinition directly.
    • Create a strongly typed index definition using TimeSeriesIndexDefinitionBuilder.

Examples of time series indexes

Map index - index single time series from single collection:

  • In this index, we index data from the "StockPrices" time series entries in the "Companies" collection (TradeVolume, Date).

  • In addition, we index the containing document id (DocumentID), which is obtained from the segment,
    and some content from the document referenced by the entry's Tag (EmployeeName).

  • Each tab below presents one of the different ways the index can be defined.

    class StockPriceTimeSeriesFromCompanyCollection(AbstractTimeSeriesIndexCreationTask):
        # The index-entry:
        # ================
        class IndexEntry:
            def __init__(
                self, trade_volume: float = None, date: datetime = None, company_id: str = None, employee_name: str = None
            ):
                # The index-fields:
                # =================
                self.trade_volume = trade_volume
                self.date = date
                self.company_id = company_id
                self.employee_name = employee_name
    
        def __init__(self):
            super().__init__()
            self.map = """
            from segment in timeSeries.Companies.StockPrices
            from entry in segment.Entries
            
            let employee = LoadDocument(entry.Tag, "Employees")
            
            select new
            {
                trade_volume = entry.Values[4],
                date = entry.Timestamp.Date,
                company_id = segment.DocumentId,
                employee_name = employee.FirstName + " " + employee.LastName
            }
            """
    class StockPriceTimeSeriesFromCompanyCollection_JS(AbstractJavaScriptTimeSeriesIndexCreationTask):
        def __init__(self):
            super().__init__()
            self.maps = {
                """
                timeSeries.map('Companies', 'StockPrices', function (segment) {
    
                    return segment.Entries.map(entry => {
                        let employee = load(entry.Tag, 'Employees');
    
                        return {
                            trade_volume: entry.Values[4],
                            date: new Date(entry.Timestamp.getFullYear(),
                                           entry.Timestamp.getMonth(),
                                           entry.Timestamp.getDate()),
                            company_id: segment.DocumentId,
                            employee_name: employee.FirstName + ' ' + employee.LastName
                        };
                    });
                })
                """
            }
    # Define the 'index definition'
    index_definition = TimeSeriesIndexDefinition(
        name="StockPriceTimeSeriesFromCompanyCollection",
        maps={
            """
            from segment in timeSeries.Companies.StockPrices 
            from entry in segment.Entries 
    
            let employee = LoadDocument(entry.Tag, "Employees")
    
            select new 
            { 
                trade_volume = entry.Values[4], 
                date = entry.Timestamp.Date,
                company_id = segment.DocumentId,
                employee_name = employee.FirstName + ' ' + employee.LastName 
            }
            """
        },
    )
    
    # Deploy the index to the server via 'PutIndexesOperation'
    store.maintenance.send(PutIndexesOperation(index_definition))
    # Create the index builder
    ts_index_def_builder = TimeSeriesIndexDefinitionBuilder("StockPriceTimeSeriesFromCompanyCollection")
    
    ts_index_def_builder.map = """
        from segment in timeSeries.Companies.StockPrices
        from entry in segment.Entries
        select new 
        {
            trade_volume = entry.Values[4],
            date = entry.Timestamp.Date,
            company_id = segment.DocumentId,
        }
    """
    # Build the index definition
    index_definition_from_builder = ts_index_def_builder.to_index_definition(store.conventions)
    
    # Deploy the index to the server via 'PutIndexesOperation'
    store.maintenance.send(PutIndexesOperation(index_definition_from_builder))
  • Querying this index, you can retrieve the indexed time series data while filtering by any of the index-fields.

    with store.open_session() as session:
        # Retrieve time series data for the specified company:
        # ====================================================
        results = list(
            session.query_index_type(
                StockPriceTimeSeriesFromCompanyCollection, StockPriceTimeSeriesFromCompanyCollection.IndexEntry
            ).where_equals("company_id", "Companies/91-A")
        )
    
        # Results will include data from all 'StockPrices' entries in document 'Companies/91-A'
    from index "StockPriceTimeSeriesFromCompanyCollection"
    where "CompanyID" == "Comapnies/91-A"
    with store.open_session() as session:
        # Find what companies had a very high trade volume:
        # =================================================
        results = list(
            session.query_index_type(
                StockPriceTimeSeriesFromCompanyCollection, StockPriceTimeSeriesFromCompanyCollection.IndexEntry
            )
            .where_greater_than_or_equal("trade_volume", 150_000_000)
            .select_fields(OnlyCompanyName, "company_id")
            .distinct()
        )
    
        # Results will contain company "Companies/65-A"
        # since it is the only company with time series entries having such high trade volume.
    from index "StockPriceTimeSeriesFromCompanyCollection"
    where "TradeVolume" > 150_000_000
    select distinct CompanyID

Multi-Map index - index time series from several collections:

class Vechicles_ByLocation(AbstractMultiMapTimeSeriesIndexCreationTask):
    class IndexEntry:
        def __init__(
            self, latitude: float = None, longitude: float = None, date: datetime = None, document_id: str = None
        ):
            self.latitude = latitude
            self.longitude = longitude
            self.date = date
            self.document_id = document_id

    def __init__(self):
        super().__init__()
        self._add_map(
            """
            from segment in timeSeries.Planes.GPS_Coordinates
            from entry in segment.Entries
            select new
            {
                latitude = entry.Values[0],
                longitude = entry.Values[1],
                date = entry.Timestamp.Date,
                document_id = segment.DocumentId
            }
            """
        )
        self._add_map(
            """
            from segment in timeSeries.Ships.GPS_Coordinates
            from entry in segment.Entries
            select new
            {
                latitude = entry.Values[0],
                longitude = entry.Values[1],
                date = entry.Timestamp.Date,
                document_id = segment.DocumentId
            }
            """
        )

Map-Reduce index:

class TradeVolume_PerDay_ByCountry(AbstractTimeSeriesIndexCreationTask):
    class Result:
        def __init__(self, total_trade_volume: float = None, date: datetime = None, country: str = None):
            self.total_trade_volume = total_trade_volume
            self.date = date
            self.country = country

    def __init__(self):
        super().__init__()
        # Define the Map part:
        self.map = """
        from segment in timeSeries.Companies.StockPrices
        from entry in segment.Entries
        
        let company = LoadDocument(segment.DocumentId, 'Companies')
        
        select new
        {
            date = entry.Timestamp.Date,
            country = company.Address.Country,
            total_trade_volume = entry.Values[4],
        }
        """

        # Define the Reduce part:
        self._reduce = """
        from r in results
        group r by new {r.date, r.country}
        into g
        select new 
        {
            date = g.Key.date,
            country = g.Key.country,
            total_trade_volume = g.Sum(x => x.total_trade_volume)
        }
        """

Syntax


AbstractJavaScriptTimeSeriesIndexCreationTask

class AbstractJavaScriptTimeSeriesIndexCreationTask(AbstractIndexCreationTaskBase[TimeSeriesIndexDefinition]):
    def __init__(
        self,
        conventions: DocumentConventions = None,
        priority: IndexPriority = None,
        lock_mode: IndexLockMode = None,
        deployment_mode: IndexDeploymentMode = None,
        state: IndexState = None,
    ):
        super().__init__(conventions, priority, lock_mode, deployment_mode, state)
        self._definition = TimeSeriesIndexDefinition()

    @property
    def maps(self) -> Set[str]:
        return self._definition.maps

    @maps.setter
    def maps(self, maps: Set[str]):
        self._definition.maps = maps

    @property
    def reduce(self) -> str:
        return self._definition.reduce

    @reduce.setter
    def reduce(self, reduce: str):
        self._definition.reduce = reduce

Learn more about JavaScript indexes in JavaScript Indexes.


TimeSeriesIndexDefinition

class TimeSeriesIndexDefinition(IndexDefinition):
    @property
    def source_type(self) -> IndexSourceType:
        return IndexSourceType.TIME_SERIES

While TimeSeriesIndexDefinition is currently functionally equivalent to the regular IndexDefinition class from which it inherits, it is recommended to use TimeSeriesIndexDefinition when creating a time series index definition in case additional functionality is added in future versions of RavenDB.


TimeSeriesIndexDefinitionBuilder

class TimeSeriesIndexDefinitionBuilder(AbstractIndexDefinitionBuilder[TimeSeriesIndexDefinition]):
    def __init__(self, index_name: Optional[str] = None):
        super().__init__(index_name)
        self.map: Optional[str] = None

TimeSeriesSegment

  • Segment properties include the entries data and aggregated values that RavenDB automatically updates in the segment's header.

  • The following segment properties can be indexed:

    public sealed class TimeSeriesSegment
    {
        // The ID of the document this time series belongs to
        public string DocumentId { get; set; }
     
        // The name of the time series this segment belongs to
        public string Name { get; set; }
      
        // The smallest values from all entries in the segment
        // The first array item is the Min of all first values, etc.
        public double[] Min { get; set; }
    
        // The largest values from all entries in the segment
        // The first array item is the Max of all first values, etc.
        public double[] Max { get; set; }
      
        // The sum of all values from all entries in the segment 
        // The first array item is the Sum of all first values, etc.
        public double[] Sum { get; set; }
      
        // The number of entries in the segment
        public int Count { get; set; }
      
        // The timestamp of the first entry in the segment
        public DateTime Start { get; set; }
      
        // The timestamp of the last entry in the segment
        public DateTime End { get; set; }
      
        // The segment's entries themselves
        public TimeSeriesEntry[] Entries { get; set; }
    }
  • These are the properties of a TimeSeriesEntry which can be indexed:

    public class TimeSeriesEntry
    {
        public DateTime Timestamp;
        public string Tag;
        public double[] Values;
    
        // This is exactly equivalent to Values[0]
        public double Value;
    }