Indexes: Indexing Related Documents
-
As described in modeling considerations in RavenDB,
it is recommended for documents to be: independent, isolated, and coherent.
However, to accommodate varied models, documents can reference other documents. -
The related data from a referenced (related) document can be indexed,
this will allow querying the collection by the indexed related data. -
The related documents that are loaded in the index definition can be either Tracked or Not-Tracked.
-
In this page:
What are related documents
-
Whenever a document references another document, the referenced document is called a Related Document.
-
In the image below, document
products/34-A
references documentscategories/1-A
&suppliers/16-A
,
which are considered Related Documents.
Index related documents - With tracking
Example I - basic
-
What is tracked:
Both the documents from the indexed collection and the indexed related documents are tracked for changes.
Re-indexing will be triggered per any change in either collection.
(See changes that cause re-indexing here). -
The index:
Following the aboveProduct - Category
relationship from the Northwind sample database,
an index defined on the Products collection can index data from the related Category document.class Products_ByCategoryName(AbstractIndexCreationTask): class IndexEntry: def __init__(self, category_name: str = None): self.category_name = category_name def __init__(self): super().__init__() self.map = ( "from product in docs.Products " 'let category = this.LoadDocument(product.Category, "Categories") ' "select new { category_name = category.Name }" )
class Products_ByCategoryName_JS(AbstractJavaScriptIndexCreationTask): def __init__(self): super().__init__() self.maps = { # Call method 'load' to load the related Category document # The document ID to load is specified by 'product.Category' # The Name field from the related Category document will be indexed """ map('products', function(product) { let category = load(product.Category, 'Categories') return { category_name: category.Name }; }) """ # Since no_tracking was not specified, # then any change to either Products or Categories will trigger reindexing }
-
The query:
We can now query the index for Product documents byCategoryName
,
i.e. get all matching Products that reference a Category that has the specified name term.matching_products = list( session.query_index_type(Products_ByCategoryName, Products_ByCategoryName.IndexEntry) .where_equals("category_name", "Beverages") .of_type(Product) )
from index "Products/ByCategoryName" where CategoryName == "Beverages"
Example II - list
-
The documents:
# The referencing document class Author: def __init__(self, Id: str = None, name: str = None, book_ids: List[str] = None): self.Id = Id self.name = name # Referencing a list of related document IDs self.book_ids = book_ids # The related document class Book: def __init__(self, Id: str = None, name: str = None): self.Id = Id self.name = name
-
The index:
This index will index all names of the related Book documents.class Authors_ByBooks(AbstractIndexCreationTask): class IndexEntry: def __init__(self, book_names: List[str] = None): self.book_names = book_names def __init__(self): super().__init__() self.map = ( "from author in docs.Authors " "select new " "{" # For each Book ID, call LoadDocument and index the book's name ' book_names = author.book_ids.Select(x => LoadDocument(x, "Books").Name)' "}" ) # Since no_tracking was not specified, # then any change to either Authors or Books will trigger reindexing
class Authors_ByBooks_JS(AbstractJavaScriptIndexCreationTask): def __init__(self): super().__init__() self.maps = { # For each Book ID, call 'load' and index the book's name """ map('Author', function(author) { return { books: author.BooksIds.map(x => load(x, 'Books').Name) } }) """ # Since no_tracking was not specified, # then any change to either Authors or Books will trigger reindexing }
-
The query:
We can now query the index for Author documents by a book's name,
i.e. get all Authors that have the specified book's name in their list.# Get all authors that have books with title: "The Witcher" matching_authors = list( session.query_index_type(Authors_ByBooks, Authors_ByBooks.IndexEntry) .where_in("book_names", ["The Witcher"]) .of_type(Author) )
// Get all authors that have books with title: "The Witcher" from index "Authors/ByBooks" where BookNames = "The Witcher"
Tracking implications
-
Indexing related data with tracking can be a useful way to query documents by their related data.
However, that may come with performance costs. -
Re-indexing will be triggered whenever any document in the collection that is referenced by
LoadDocument
is changed. Even when indexing just a single field from the related document, any change to any other field will cause re-indexing. (See changes that cause re-indexing here). -
Frequent re-indexing will increase CPU usage and reduce performance,
and index results may be stale for prolonged periods. -
Tracking indexed related data is more useful when the indexed related collection is known not to change much.
Index related documents - No tracking
Example III - no tracking
-
What is tracked:
- Only the documents from the indexed collection are tracked for changes and can trigger re-indexing.
Any change done to any document in the indexed related documents will Not trigger re-indexing.
(See changes that cause re-indexing here).
- Only the documents from the indexed collection are tracked for changes and can trigger re-indexing.
-
The index:
class Products_ByCategoryName_NoTracking(AbstractIndexCreationTask): class IndexEntry: def __init__(self, category_name: str = None): self.category_name = category_name def __init__(self): super().__init__() self.map = ( "from product in docs.Products " # Call NoTracking.LoadDocument to load the related Category document w/o tracking 'let category = NoTracking.LoadDocument(product.Category, "Categories") ' "select new {" # Index the name field from the related Category document " category_name = category.Name " "}" ) # Since NoTracking is used - # then only the changes to Products will trigger reindexing
class Products_ByCategoryName_NoTracking_JS(AbstractJavaScriptIndexCreationTask): def __init__(self): super().__init__() self.maps = { # Call 'noTracking.load' to load the related Category document w/o tracking """ map('products', function(product) { let category = noTracking.load(product.Category, 'Categories') return { category_name: category.Name }; }) """ } # Since noTracking is used - # then only the changes to Products will trigger reindexing
-
The query:
When querying the index for Product documents byCategoryName
,
results will be based on the related data that was first indexed when the index was deployed.matching_products = list( session.query_index_type( Products_ByCategoryName_NoTracking, Products_ByCategoryName_NoTracking.IndexEntry ) .where_equals("category_name", "Beverages") .of_type(Product) )
from index "Products/ByCategoryName/NoTracking" where CategoryName == "Beverages"
No-tracking implications
-
Indexing related data with no-tracking can be a useful way to query documents by their related data.
However, that may come with some data accuracy costs. -
Re-indexing will Not be triggered when documents in the collection that is referenced by
LoadDocument
are changed. Although this may save system resources, the index entries and the indexed terms may not be updated with the current state of data. -
Indexing related data without tracking is useful when the indexed related data is fixed and not supposed to change.
Document changes that cause re-indexing
-
The following changes done to a document will trigger re-indexing:
- Any modification to any document field (not just to the indexed fields)
- Adding/Deleting an attachment
- Creating a new Time series (modifying existing will not trigger)
- Creating a new Counter (modifying existing will not trigger)
-
Any such change done on any document in the indexed collection will trigger re-indexing.
-
Any such change done on any document in the indexed related documents will trigger re-indexing
only ifNoTracking
was Not used in the index definition.
LoadDocument syntax
T LoadDocument<T>(string relatedDocumentId);
T LoadDocument<T>(string relatedDocumentId, string relatedCollectionName);
T[] LoadDocument<T>(IEnumerable<string> relatedDocumentIds);
T[] LoadDocument<T>(IEnumerable<string> relatedDocumentIds, string relatedCollectionName);
Syntax for JavaScript-index:
object load(relatedDocumentId, relatedCollectionName);
Parameters | ||
---|---|---|
relatedDocumentId | string |
ID of the related document to load |
relatedCollectionName | string |
The related collection name |
relatedDocumentIds | IEnumerable<string> |
A list of related document IDs to load |