Indexing Nested data



Sample data

  • The examples in this article are based on the following Classes and Sample Data:

public class OnlineShop
{
    public string ShopName { get; set; }
    public string Email { get; set; }
    public List<TShirt> TShirts { get; set; } // Nested data
}

public class TShirt
{
    public string Color { get; set; }
    public string Size { get; set; }
    public string Logo { get; set; }
    public decimal Price { get; set; }
    public int Sold { get; set; }
}
// Creating sample data for the examples in this article:
// ======================================================

var onlineShops = new[]
{
  // Shop1
  new OnlineShop { ShopName = "Shop1", Email = "sales@shop1.com", TShirts = new List<TShirt> {
      new TShirt { Color = "Red", Size = "S", Logo = "Bytes and Beyond", Price = 25, Sold = 2 },
      new TShirt { Color = "Red", Size = "M", Logo = "Bytes and Beyond", Price = 25, Sold = 4 },
      new TShirt { Color = "Blue", Size = "M", Logo = "Query Everything", Price = 28, Sold = 5 },
      new TShirt { Color = "Green", Size = "L", Logo = "Data Driver", Price = 30, Sold = 3}
  }},
  // Shop2
  new OnlineShop { ShopName = "Shop2", Email = "sales@shop2.com", TShirts = new List<TShirt> {
      new TShirt { Color = "Blue", Size = "S", Logo = "Coffee, Code, Repeat", Price = 22, Sold = 12 },
      new TShirt { Color = "Blue", Size = "M", Logo = "Coffee, Code, Repeat", Price = 22, Sold = 7 },
      new TShirt { Color = "Green", Size = "M", Logo = "Big Data Dreamer", Price = 25, Sold = 9 },
      new TShirt { Color = "Black", Size = "L", Logo = "Data Mining Expert", Price = 20, Sold = 11 }
  }},
  // Shop3
  new OnlineShop { ShopName = "Shop3", Email = "sales@shop3.com", TShirts = new List<TShirt> {
      new TShirt { Color = "Red", Size = "S", Logo = "Bytes of Wisdom", Price = 18, Sold = 2 },
      new TShirt { Color = "Blue", Size = "M", Logo = "Data Geek", Price = 20, Sold = 6 },
      new TShirt { Color = "Black", Size = "L", Logo = "Data Revolution", Price = 15, Sold = 8 },
      new TShirt { Color = "Black", Size = "XL", Logo = "Data Revolution", Price = 15, Sold = 10 }
  }}
};

using (var session = store.OpenSession())
{
    foreach (var shop in onlineShops)
    {
        session.Store(shop);
    }

    session.SaveChanges();
}

Simple index - Single index-entry per document

The index:

public class Shops_ByTShirt_Simple : AbstractIndexCreationTask<OnlineShop>
{
    public class IndexEntry
    {
        // The index-fields:
        public IEnumerable<string> Colors { get; set; }
        public IEnumerable<string> Sizes { get; set; }
        public IEnumerable<string> Logos { get; set; }
    }
    
    public Shops_ByTShirt_Simple()
    {
        Map = shops => from shop in shops
            // Creating a SINGLE index-entry per document:
            select new IndexEntry
            {
                // Each index-field will hold a collection of nested values from the document
                Colors = shop.TShirts.Select(x => x.Color),
                Sizes = shop.TShirts.Select(x => x.Size),
                Logos = shop.TShirts.Select(x => x.Logo)
            };
    }
}

The index-entries:

Simple - index-entries

A single index-entry per document

  1. The index-entries content is visible from the Studio Query view.

  2. Check option: Show raw index-entries instead of Matching documents.

  3. Each row represents an index-entry.
    The index has a single index-entry per document (3 entries in this example).

  4. The index-field contains a collection of ALL nested values from the document.
    e.g. The third index-entry has the following values in the Colors index-field:
    {"black", "blue", "red"}


Querying the index:

// Query for all shop documents that have a red TShirt
var shopsThatHaveRedShirts = session
    .Query<Shops_ByTShirt_Simple.IndexEntry, Shops_ByTShirt_Simple>()
     // Filter query results by a nested value
    .Where(x => x.Colors.Contains("red"))
    .OfType<OnlineShop>()
    .ToList();
// Query for all shop documents that have a red TShirt
var shopsThatHaveRedShirts = await asyncSession
    .Query<Shops_ByTShirt_Simple.IndexEntry, Shops_ByTShirt_Simple>()
     // Filter query results by a nested value
    .Where(x => x.Colors.Contains("red"))
    .OfType<OnlineShop>()
    .ToListAsync();
// Query for all shop documents that have a red TShirt
var shopsThatHaveRedShirts = session.Advanced
    .DocumentQuery<Shops_ByTShirt_Simple.IndexEntry, Shops_ByTShirt_Simple>()
     // Filter query results by a nested value
    .ContainsAny(x => x.Colors, new[] { "Red" })
    .OfType<OnlineShop>()
    .ToList();
from index "Shops/ByTShirt/Simple"
where Colors == "red"

// Results will include the following shop documents:
// ==================================================
// * Shop1
// * Shop3

When to use:

  • This type of index structure is effective for retrieving documents when filtering the query by any of the inner nested values that were indexed.

  • However, due to the way the index-entries are generated, this index cannot provide results for a query searching for documents that contain specific sub-objects which satisfy some AND condition.
    For example:

    // You want to query for shops containing "Large Green TShirts",
    // aiming to get only "Shop1" as a result since it has such a combination,
    // so you attempt this query:
    var GreenAndLarge = session
        .Query<Shops_ByTShirt_Simple.IndexEntry, Shops_ByTShirt_Simple>()
        .Where(x => x.Colors.Contains("green") && x.Sizes.Contains("L"))
        .OfType<OnlineShop>()
        .ToList();
    
    // But, the results of this query will include BOTH "Shop1" & "Shop2"
    // since the index-entries do not keep the original sub-objects structure.
  • To address this, you must use a Fanout index - as described below.

Fanout index - Multiple index-entries per document

What is a Fanout index:

  • A fanout index is an index that outputs multiple index-entries per document.
    A separate index-entry is created for each nested sub-object from the document.

  • The fanout index is useful when you need to retrieve documents matching query criteria
    that search for specific sub-objects that comply with some logical conditions.

Fanout index - Map index example:

// A fanout map-index:
// ===================
public class Shops_ByTShirt_Fanout : AbstractIndexCreationTask<OnlineShop>
{
    public class IndexEntry
    {
        // The index-fields:
        public string Color { get; set; }
        public string Size { get; set; }
        public string Logo { get; set; }
    }
    
    public Shops_ByTShirt_Fanout()
    {
        Map = shops =>
            from shop in shops
            from shirt in shop.TShirts
            // Creating MULTIPLE index-entries per document,
            // an index-entry for each sub-object in the TShirts list
            select new IndexEntry
            {
                Color = shirt.Color,
                Size = shirt.Size,
                Logo = shirt.Logo
            };
    }
}
public class Shops_ByTShirt_JS : AbstractJavaScriptIndexCreationTask
{
    public Shops_ByTShirt_JS()
    {
        Maps = new HashSet<string>
        {
            @"map('OnlineShops', function (shop){ 
                   var res = [];
                   shop.TShirts.forEach(shirt => {
                       res.push({
                           Color: shirt.Color,
                           Size: shirt.Size,
                           Logo: shirt.Logo
                       })
                    });
                    return res;
                })"
        };
    }
}

// Query the fanout index:
// =======================
var shopsThatHaveMediumRedShirts = session
    .Query<Shops_ByTShirt_Fanout.IndexEntry, Shops_ByTShirt_Fanout>()
     // Query for documents that have a "Medium Red TShirt"
    .Where(x => x.Color == "red" && x.Size == "M")
    .OfType<OnlineShop>()
    .ToList();
// Query the fanout index:
// =======================
var shopsThatHaveMediumRedShirts = await asyncSession
    .Query<Shops_ByTShirt_Fanout.IndexEntry, Shops_ByTShirt_Fanout>()
     // Query for documents that have a "Medium Red TShirt"
    .Where(x => x.Color == "red" && x.Size == "M")
    .OfType<OnlineShop>()
    .ToListAsync();
// Query the fanout index:
// =======================
var shopsThatHaveMediumRedShirts = session.Advanced
    .DocumentQuery<Shops_ByTShirt_Fanout.IndexEntry, Shops_ByTShirt_Fanout>()
     // Query for documents that have a "Medium Red TShirt"
    .WhereEquals(x => x.Color, "red")
    .AndAlso()
    .WhereEquals(x=> x.Size, "M")
    .OfType<OnlineShop>()
    .ToList();
from index "Shops/ByTShirt/Fanout" 
where Color == "red" and Size == "M"

// Query results:
// ==============

// Only the 'Shop1' document will be returned,
// since it is the only document that has the requested combination within the TShirt list.

The index-entries:

Fanout - index-entries

Multiple index-entries per document

  1. The index-entries content is visible from the Studio Query view.

  2. Check option: Show raw index-entries instead of Matching documents.

  3. Each row represents an index-entry.
    Each index-entry corresponds to an inner item in the TShirt list.

  4. In this example, the total number of index-entries is 12,
    which is the total number of inner items in the TShirt list in all 3 documents in the collection.

Fanout index - Map-Reduce index example:

  • The fanout index concept applies to map-reduce indexes as well:

// A fanout map-reduce index:
// ==========================
public class Sales_ByTShirtColor_Fanout : 
    AbstractIndexCreationTask<OnlineShop, Sales_ByTShirtColor_Fanout.IndexEntry>
{
    public class IndexEntry
    {
        // The index-fields:
        public string Color { get; set; }
        public int ItemsSold { get; set; }
        public decimal TotalSales { get; set; }
    }

    public Sales_ByTShirtColor_Fanout()
    {
        Map = shops => 
            from shop in shops
            from shirt in shop.TShirts
            // Creating MULTIPLE index-entries per document,
            // an index-entry for each sub-object in the TShirts list
            select new IndexEntry
            {
                Color = shirt.Color,
                ItemsSold = shirt.Sold,
                TotalSales = shirt.Price * shirt.Sold
            };

        Reduce = results => from result in results
            group result by result.Color
            into g
            select new
            {
                // Calculate sales per color
                Color = g.Key,
                ItemsSold = g.Sum(x => x.ItemsSold),
                TotalSales = g.Sum(x => x.TotalSales)
            };
    }
}
public class Product_Sales : AbstractJavaScriptIndexCreationTask
{
    public class Result
    {
        public string Product { get; set; }

        public int Count { get; set; }

        public decimal Total { get; set; }
    }

    public Product_Sales()
    {
        Maps = new HashSet<string>()
        {
            @"map('orders', function(order){
                    var res = [];
                    order.Lines.forEach(l => {
                        res.push({
                            Product: l.Product,
                            Count: 1,
                            Total:  (l.Quantity * l.PricePerUnit) * (1- l.Discount)
                        })
                    });
                    return res;
                })"
        };

        Reduce = @"groupBy(x => x.Product)
            .aggregate(g => {
                return {
                    Product : g.key,
                    Count: g.values.reduce((sum, x) => x.Count + sum, 0),
                    Total: g.values.reduce((sum, x) => x.Total + sum, 0)
                }
            })";
    }
}

// Query the fanout index:
// =======================
var queryResult = session
    .Query<Sales_ByTShirtColor_Fanout.IndexEntry, Sales_ByTShirtColor_Fanout>()
     // Query for index-entries that contain "black"
    .Where(x => x.Color == "black")
    .FirstOrDefault();

// Get total sales for black TShirts
var blackShirtsSales = queryResult?.TotalSales ?? 0;
// Query the fanout index:
// =======================
var queryResult = await asyncSession
    .Query<Sales_ByTShirtColor_Fanout.IndexEntry, Sales_ByTShirtColor_Fanout>()
     // Query for index-entries that contain "black"
    .Where(x => x.Color == "black")
    .FirstOrDefaultAsync();

// Get total sales for black TShirts
var blackShirtsSales = queryResult?.TotalSales ?? 0;
// Query the fanout index:
// =======================
var queryResult = session.Advanced
    .DocumentQuery<Sales_ByTShirtColor_Fanout.IndexEntry, Sales_ByTShirtColor_Fanout>()
    // Query for index-entries that contain "black"
    .WhereEquals(x => x.Color, "black")
    .FirstOrDefault();

// Get total sales for black TShirts
var blackShirtsSales = queryResult?.TotalSales ?? 0;
from index "Sales/ByTShirtColor/Fanout"
where Color == "black"

// Query results:
// ==============

// With the sample data used in this article,
// The total sales revenue from black TShirts sold (in all shops) is 490.0

Fanout index - Performance hints:

  • Fanout indexes are typically more resource-intensive than other indexes as RavenDB has to index a large number of index-entries. This increased workload can lead to higher CPU and memory utilization, potentially causing a decline in the overall performance of the index.

  • When the number of index-entries generated from a single document exceeds a configurable limit,
    RavenDB will issue a High indexing fanout ratio alert in the Studio notification center.

  • You can control when this performance hint is created by setting the PerformanceHints.Indexing.MaxIndexOutputsPerDocument configuration key (default is 1024).

  • So, for example, adding another OnlineShop document with a tShirt object containing 1025 items
    will trigger the following alert:

    Figure 1. High indexing fanout ratio notification

    High indexing fanout ratio notification

  • Clicking the 'Details' button will show the following info:

    Figure 2. Fanout index, performance hint details

    Fanout index, performance hint details

Fanout index - Paging:

  • A fanout index has more index-entries than the number of documents in the collection indexed.
    Multiple index-entries "point" to the same document from which they originated,
    as can be seen in the above index-entries example.

  • When making a fanout index query that should return full documents (without projecting results),
    then in this case, the TotalResults property (available via the QueryStatistics object) will contain
    the total number of index-entries and Not the total number of resulting documents.

  • To overcome this when paging results, you must take into account the number of "duplicate"
    index-entries that are skipped internally by the server when serving the resulting documents.

  • Please refer to paging through tampered results for further explanation and examples.