Indexing Nested data



Sample data

  • The examples in this article are based on the following Classes and Sample Data:

    class OnlineShop {
        constructor(
            shopName = '',
            email = '',
            tShirts = {} // Will contain the nested data
        ) {
            Object.assign(this, { shopName, email, tShirts });
        }
    }
    
    class TShirt {
        constructor(
            color = '',
            size = '',
            logo = '',
            price = 0,
            sold = 0
        ) {
            Object.assign(this, { color, size, logo, price, sold });
        }
    }
    // Creating sample data for the examples in this article:
    // ======================================================
    
    const bulkInsert = store.bulkInsert();
    
    const onlineShops = [
        new OnlineShop("Shop1", "sales@shop1.com", [
            new TShirt("Red", "S", "Bytes and Beyond", 25, 2),
            new TShirt("Red", "M", "Bytes and Beyond", 25, 4),
            new TShirt("Blue", "M", "Query Everything", 28, 5),
            new TShirt("Green", "L", "Data Driver", 30, 3)
        ]),
        new OnlineShop("Shop2", "sales@shop2.com", [
            new TShirt("Blue", "S", "Coffee, Code, Repeat", 22, 12),
            new TShirt("Blue", "M", "Coffee, Code, Repeat", 22, 7),
            new TShirt("Green", "M", "Big Data Dreamer", 25, 9),
            new TShirt("Black", "L", "Data Mining Expert", 20, 11)
        ]),
        new OnlineShop("Shop3", "sales@shop3.com", [
            new TShirt("Red", "S", "Bytes of Wisdom", 18, 2),
            new TShirt("Blue", "M", "Data Geek", 20, 6),
            new TShirt("Black", "L", "Data Revolution", 15, 8),
            new TShirt("Black", "XL", "Data Revolution", 15, 10)
        ])
    ];
    
    for (const shop of onlineShops ) {
        await bulkInsert.store(shop);
    }
    
    await bulkInsert.finish();

Simple index - Single index-entry per document

  • The index:
    class Shops_ByTShirt_Simple extends AbstractJavaScriptIndexCreationTask {
        constructor () {
            super();
    
            // Creating a SINGLE index-entry per document:
            this.map("OnlineShops", shop => {
                return {
                    // Each index-field will hold a collection of nested values from the document
                    colors: shop.tShirts.map(x => x.color),
                    sizes: shop.tShirts.map(x => x.size),
                    logos: shop.tShirts.map(x => x.logo)
                };
            });
        }
    }

  • The index-entries:

    Simple - index-entries

    A single index-entry per document

    1. The index-entries content is visible from the Studio Query view.

    2. Check option: Show raw index-entries instead of Matching documents.

    3. Each row represents an index-entry.
      The index has a single index-entry per document (3 entries in this example).

    4. The index-field contains a collection of ALL nested values from the document.
      e.g. The third index-entry has the following values in the Colors index-field:
      {"black", "blue", "red"}

  • Querying the index:

    // Query for all shop documents that have a red TShirt
    const results = await session
        .query({ indexName: "Shops/ByTShirt/Simple" })
         // Filter query results by a nested value
        .containsAny("colors", ["red"])
        .all();
    from index "Shops/ByTShirt/Simple"
    where colors == "red"

    // Results will include the following shop documents:
    // ==================================================
    // * Shop1
    // * Shop3
  • When to use:

    • This type of index structure is effective for retrieving documents when filtering the query by any of the inner nested values that were indexed.

    • However, due to the way the index-entries are generated, this index cannot provide results for a query searching for documents that contain specific sub-objects which satisfy some AND condition.
      For example:

      // You want to query for shops containing "Large Green TShirts",
      // aiming to get only "Shop1" as a result since it has such a combination,
      // so you attempt this query:
      const greenAndLarge = await session
          .query({ indexName: "Shops/ByTShirt/Simple" })
          .containsAny("colors", ["green"])
          .andAlso()
          .containsAny("sizes", ["L"])
          .all();
      
      // But, the results of this query will include BOTH "Shop1" & "Shop2"
      // since the index-entries do not keep the original sub-objects structure.
    • To address this, you must use a Fanout index - as described below.

Fanout index - Multiple index-entries per document

  • What is a Fanout index:

    • A fanout index is an index that outputs multiple index-entries per document.
      A separate index-entry is created for each nested sub-object from the document.

    • The fanout index is useful when you need to retrieve documents matching query criteria
      that search for specific sub-objects that comply with some logical conditions.

  • Fanout index - Map index example:

    // A fanout map-index:
    // ===================
    class Shops_ByTShirt_Fanout extends AbstractJavaScriptIndexCreationTask {
        constructor () {
            super();
    
            // Creating MULTIPLE index-entries per document,
            // an index-entry for each sub-object in the tShirts list
            this.map("OnlineShops", shop => {
                return shop.tShirts.map(shirt => {
                    return {
                        color: shirt.color,
                        size: shirt.size,
                        logo: shirt.logo
                    };
                });
            });
        }
    }

    // Query the fanout index:
    // =======================
    const shopsThatHaveMediumRedShirts = await session
        .query({ indexName: "Shops/ByTShirt/Fanout" })
         // Query for documents that have a "Medium Red TShirt"
        .whereEquals("color", "red")
        .andAlso()
        .whereEquals("size", "M")
        .all();
    from index "Shops/ByTShirt/Fanout" 
    where color == "red" and size == "M"

    // Query results:
    // ==============
    
    // Only the 'Shop1' document will be returned,
    // since it is the only document that has the requested combination within the tShirt list.
  • The index-entries: Fanout - index-entries

    1. The index-entries content is visible from the Studio Query view.

    2. Check option: Show raw index-entries instead of Matching documents.

    3. Each row represents an index-entry.
      Each index-entry corresponds to an inner item in the TShirt list.

    4. In this example, the total number of index-entries is 12,
      which is the total number of inner items in the TShirt list in all 3 documents in the collection.

  • Fanout index - Map-Reduce index example:

    • The fanout index concept applies to map-reduce indexes as well:

      // A fanout map-reduce index:
      // ==========================
      class Sales_ByTShirtColor_Fanout extends AbstractJavaScriptIndexCreationTask {
          constructor () {
              super();
      
              this.map("OnlineShops", shop => {
                  return shop.tShirts.map(shirt => {
                      return {
                          // Define the index-fields:
                          color: shirt.color,
                          itemsSold: shirt.sold,
                          totalSales: shirt.price * shirt.sold
                      };
                  });
              });
      
              this.reduce(results => results
                  .groupBy(shirt => shirt.color)
                  .aggregate(g => {
                      return {
                          // Calculate sales per color
                          color: g.key,
                          itemsSold: g.values.reduce((p, c) => p + c.itemsSold, 0),
                          totalSales: g.values.reduce((p, c) => p + c.totalSales, 0),
                      }
                  }));
          }
      }

      // Query the fanout index:
      // =======================
      const queryResult = await session
          .query({ indexName: "Sales/ByTShirtColor/Fanout" })
           // Query for index-entries that contain "black"
          .whereEquals("color", "black")
          .firstOrNull();
      
      // Get total sales for black TShirts
      const blackShirtsSales = queryResult?.totalSales ?? 0;
      from index "Sales/ByTShirtColor/Fanout"
      where color == "black"

      // Query results:
      // ==============
      
      // With the sample data used in this article,
      // The total sales revenue from black TShirts sold (in all shops) is 490
  • Fanout index - Performance hints:

    • Fanout indexes are typically more resource-intensive than other indexes as RavenDB has to index a large number of index-entries. This increased workload can lead to higher CPU and memory utilization, potentially causing a decline in the overall performance of the index.

    • When the number of index-entries generated from a single document exceeds a configurable limit,
      RavenDB will issue a High indexing fanout ratio alert in the Studio notification center.

    • You can control when this performance hint is created by setting the PerformanceHints.Indexing.MaxIndexOutputsPerDocument configuration key (default is 1024).

    • So, for example, adding another OnlineShop document with a tShirt object containing 1025 items
      will trigger the following alert:

      Figure 1. High indexing fanout ratio notification

      High indexing fanout ratio notification

    • Clicking the 'Details' button will show the following info:

      Figure 2. Fanout index, performance hint details

      Fanout index, performance hint details

  • Fanout index - Paging:

    • A fanout index has more index-entries than the number of documents in the collection indexed.
      Multiple index-entries "point" to the same document from which they originated,
      as can be seen in the above index-entries example.

    • When making a fanout index query that should return full documents (without projecting results),
      the totalResults property (available when calling the query statistics() method)
      will contain the total number of index-entries and Not the total number of resulting documents.

    • To overcome this when paging results, you must take into account the number of "duplicate"
      index-entries that are skipped internally by the server when serving the resulting documents.

    • Please refer to paging through tampered results for further explanation and examples.