Indexing Nested data
-
JSON documents can have nested structures, where one document contains other objects or arrays of objects.
-
Use a static-index to facilitate querying for documents based on the nested data.
-
In this page:
Sample data
-
The examples in this article are based on the following Classes and Sample Data:
class OnlineShop { private ?string $shopName = null; private ?string $email = null; public ?TShirtArray $tShirts = null; // Nested data public function __construct( ?string $shopName = null, ?string $email = null, ?TShirtArray $tShirts = null ) { $this->shopName = $shopName; $this->email = $email; $this->tShirts = $tShirts; } public function getShopName(): ?string { return $this->shopName; } public function setShopName(?string $shopName): void { $this->shopName = $shopName; } public function getEmail(): ?string { return $this->email; } public function setEmail(?string $email): void { $this->email = $email; } public function getTShirts(): ?TShirtArray { return $this->tShirts; } public function setTShirts(?TShirtArray $tShirts): void { $this->tShirts = $tShirts; } } class TShirt { private ?string $color = null; private ?string $size = null; private ?string $logo = null; private ?float $price = null; private ?int $sold = null; public function __construct( ?string $color = null, ?string $size = null, ?string $logo = null, ?float $price = null, ?int $sold = null ) { $this->color = $color; $this->size = $size; $this->logo = $logo; $this->price = $price; $this->sold = $sold; } public function getColor(): ?string { return $this->color; } public function setColor(?string $color): void { $this->color = $color; } public function getSize(): ?string { return $this->size; } public function setSize(?string $size): void { $this->size = $size; } public function getLogo(): ?string { return $this->logo; } public function setLogo(?string $logo): void { $this->logo = $logo; } public function getPrice(): ?float { return $this->price; } public function setPrice(?float $price): void { $this->price = $price; } public function getSold(): ?int { return $this->sold; } public function setSold(?int $sold): void { $this->sold = $sold; } } class TShirtArray extends TypedArray { public function __construct() { parent::__construct(TShirt::class); } }
// Creating sample data for the examples in this article: // ====================================================== $onlineShops = []; // Shop1 $onlineShops[] = new OnlineShop( shopName: "Shop1", email: "sales@shop1.com", tShirts: TShirtArray::fromArray([ new TShirt(color: "Red", size: "S", logo: "Bytes and Beyond", price: 25, sold: 2), new TShirt(color: "Red", size: "M", logo: "Bytes and Beyond", price: 25, sold: 4), new TShirt(color: "Blue", size: "M", logo: "Query Everything", price: 28, sold: 5), new TShirt(color: "Green", size: "L", logo: "Data Driver", price: 30, sold:3) ]) ); // Shop2 $onlineShops[] = new OnlineShop( shopName: "Shop2", email: "sales@shop2.com", tShirts: TShirtArray::fromArray([ new TShirt(color: "Blue", size: "S", logo: "Coffee, Code, Repeat", price: 22, sold: 12 ), new TShirt(color: "Blue", size: "M", logo: "Coffee, Code, Repeat", price: 22, sold: 7 ), new TShirt(color: "Green", size: "M", logo: "Big Data Dreamer", price: 25, sold: 9 ), new TShirt(color: "Black", size: "L", logo: "Data Mining Expert", price: 20, sold: 11) ]) ); // Shop3 $onlineShops[] = new OnlineShop( shopName: "Shop3", email: "sales@shop3.com", tShirts: TShirtArray::fromArray([ new TShirt(color: "Red", size: "S", logo: "Bytes of Wisdom", price: 18, sold: 2 ), new TShirt(color: "Blue", size: "M", logo: "Data Geek", price: 20, sold: 6 ), new TShirt(color: "Black", size: "L", logo: "Data Revolution", price: 15, sold: 8 ), new TShirt(color: "Black", size: "XL", logo: "Data Revolution", price: 15, sold: 10 ) ]) ); $session = $store->openSession(); try { /** @var OnlineShop $shop */ foreach ($onlineShops as $shop) { $session->store($shop); } $session->SaveChanges(); } finally { $session->close(); }
Simple index - Single index-entry per document
-
_query_1 // Query for all shop documents that have a red TShirt $shopsThatHaveRedShirts = $session ->query(Shops_ByTShirt_Simple_IndexEntry::class, Shops_ByTShirt_Simple::class) // Filter query results by a nested value ->containsAny("colors", [ "red" ]) ->ofType(OnlineShop::class) ->toList();
-
A single index-entry per document
-
The index-entries content is visible from the Studio Query view.
-
Check option: Show raw index-entries instead of Matching documents.
-
Each row represents an index-entry.
The index has a single index-entry per document (3 entries in this example). -
The index-field contains a collection of ALL nested values from the document.
e.g. The third index-entry has the following values in the Colors index-field:
{"black", "blue", "red"}
-
-
// Query for all shop documents that have a red TShirt $shopsThatHaveRedShirts = $session ->query(Shops_ByTShirt_Simple_IndexEntry::class, Shops_ByTShirt_Simple::class) // Filter query results by a nested value ->containsAny("colors", [ "red" ]) ->ofType(OnlineShop::class) ->toList();
from index "Shops/ByTShirt/Simple" where Colors == "red"
// Results will include the following shop documents: // ================================================== // * Shop1 // * Shop3
-
-
This type of index structure is effective for retrieving documents when filtering the query by any of the inner nested values that were indexed.
-
However, due to the way the index-entries are generated, this index cannot provide results for a query searching for documents that contain specific sub-objects which satisfy some
and_also
condition.
For example:
// You want to query for shops containing "Large Green TShirts", // aiming to get only "Shop1" as a result since it has such a combination, // so you attempt this query: $greenAndLarge = $session ->query(Shops_ByTShirt_Simple_IndexEntry::class, Shops_ByTShirt_Simple::class) ->whereEquals("color", "green") ->andAlso() ->whereEquals("size", "L") ->ofType(OnlineShop::class) ->toList(); // But, the results of this query will include BOTH "Shop1" & "Shop2" // since the index-entries do not keep the original sub-objects structure.
-
To address this, you must use a Fanout index - as described below.
-
Fanout index - Multiple index-entries per document
-
-
A fanout index is an index that outputs multiple index-entries per document.
A separate index-entry is created for each nested sub-object from the document. -
The fanout index is useful when you need to retrieve documents matching query criteria
that search for specific sub-objects that comply with some logical conditions.
-
-
Fanout index - Map index example:
// A fanout map-index: // =================== class Shops_ByTShirt_Fanout_IndexEntry { // The index-fields: private ?string $color = null; private ?string $size = null; private ?string $logo = null; public function getColor(): ?string { return $this->color; } public function setColor(?string $color): void { $this->color = $color; } public function getSize(): ?string { return $this->size; } public function setSize(?string $size): void { $this->size = $size; } public function getLogo(): ?string { return $this->logo; } public function setLogo(?string $logo): void { $this->logo = $logo; } } class Shops_ByTShirt_Fanout extends AbstractIndexCreationTask { public function __construct() { parent::__construct(); $this->map = "from shop in docs.OnlineShops " . "from shirt in shop.t_shirts " . // Creating MULTIPLE index-entries per document, // an index-entry for each sub-object in the TShirts list "select new {" . " color = shirt.color," . " size = shirt.size," . " logo = shirt.logo" . "}"; } }
class Shops_ByTShirt_JS extends AbstractJavaScriptIndexCreationTask { public function __construct() { parent::__construct(); $this->setMaps([ "map('OnlineShops', function (shop){ var res = []; shop.t_shirts.forEach(shirt => { res.push({ color: shirt.color, size: shirt.size, logo: shirt.logo }) }); return res; }) " ]); } }
// Query the fanout index: // ======================= $shopsThatHaveMediumRedShirts = $session ->query(Shops_ByTShirt_Fanout_IndexEntry::class, Shops_ByTShirt_Fanout::class) // Query for documents that have a "Medium Red TShirt" ->whereEquals("color", "red") ->andAlso() ->whereEquals("size", "M") ->ofType(OnlineShop::class) ->toList();
from index "Shops/ByTShirt/Fanout" where Color == "red" and Size == "M"
// Query results: // ============== // Only the 'Shop1' document will be returned, // since it is the only document that has the requested combination within the TShirt list.
-
-
The index-entries content is visible from the Studio Query view.
-
Check option: Show raw index-entries instead of Matching documents.
-
Each row represents an index-entry.
Each index-entry corresponds to an inner item in the TShirt list. -
In this example, the total number of index-entries is 12,
which is the total number of inner items in the TShirt list in all 3 documents in the collection.
-
-
Fanout index - Map-Reduce index example:
-
The fanout index concept applies to map-reduce indexes as well:
// A fanout map-reduce index: // ========================== class Sales_ByTShirtColor_Fanout_IndexEntry { // The index-fields: private ?string $color = null; private ?int $itemsSold = null; private ?float $totalSales = null; public function getColor(): ?string { return $this->color; } public function setColor(?string $color): void { $this->color = $color; } public function getItemsSold(): ?int { return $this->itemsSold; } public function setItemsSold(?int $itemsSold): void { $this->itemsSold = $itemsSold; } public function getTotalSales(): ?float { return $this->totalSales; } public function setTotalSales(?float $totalSales): void { $this->totalSales = $totalSales; } } class Sales_ByTShirtColor_Fanout extends AbstractIndexCreationTask { public function __construct() { parent::__construct(); # Creating MULTIPLE index-entries per document, # an index-entry for each sub-object in the TShirts list $this->map = "from shop in docs.OnlineShops " . "from shirt in shop.t_shirts " . // Creating MULTIPLE index-entries per document, // an index-entry for each sub-object in the TShirts list "select new {" . " color = shirt.color, " . " items_sold = shirt.sold, " . " total_sales = shirt.price * shirt.sold" . "}"; $this->reduce = "from result in results " . "group result by result.color " . "into g select new {" . " color = g.Key," . // Calculate sales per color " items_sold = g.Sum(x => x.items_sold)," . " total_sales = g.Sum(x => x.total_sales)" . "}"; } }
// Query the fanout index: // ======================= /** @var Sales_ByTShirtColor_Fanout_IndexEntry $queryResult */ $queryResult = $session ->query(Sales_ByTShirtColor_Fanout_IndexEntry::class, Sales_ByTShirtColor_Fanout::class) // Query for index-entries that contain "black" ->whereEquals("color", "black") ->firstOrDefault(); // Get total sales for black TShirts $blackShirtsSales = $queryResult?->getTotalSales() ?? 0;
from index "Sales/ByTShirtColor/Fanout" where Color == "black"
-
-
Fanout index - Performance hints:
-
Fanout indexes are typically more resource-intensive than other indexes as RavenDB has to index a large number of index-entries. This increased workload can lead to higher CPU and memory utilization, potentially causing a decline in the overall performance of the index.
-
When the number of index-entries generated from a single document exceeds a configurable limit,
RavenDB will issue a High indexing fanout ratio alert in the Studio notification center. -
You can control when this performance hint is created by setting the PerformanceHints.Indexing.MaxIndexOutputsPerDocument configuration key (default is 1024).
-
So, for example, adding another OnlineShop document with a
tShirt
object containing 1025 items
will trigger the following alert:High indexing fanout ratio notification
-
Clicking the 'Details' button will show the following info:
Fanout index, performance hint details
-
-
-
A fanout index has more index-entries than the number of documents in the collection indexed.
Multiple index-entries "point" to the same document from which they originated,
as can be seen in the above index-entries example. -
When making a fanout index query that should return full documents (without projecting results),
theTotalResults
property (available via theQueryStatistics
object) will contain
the total number of index-entries and Not the total number of resulting documents. -
To overcome this when paging results, you must take into account the number of "duplicate"
index-entries that are skipped internally by the server when serving the resulting documents. -
Please refer to paging through tampered results for further explanation and examples.
-